CSV-to-JSON is one of the most common transformations in data work — exporting from spreadsheets, prepping API payloads, seeding NoSQL databases. Python has it built-in. The hard parts are not the conversion itself but the messy realities of CSV: inconsistent delimiters, encoding mismatches, quoted fields with embedded commas, and type inference (is "42" a number or a string?).
Method 1: Built-in csv + json (zero dependencies)
The standard library handles CSV cleanly. csv.DictReader gives you list-of-dict, json.dumps gives you the output.
import csv
import json
def csv_to_json(csv_path: str, json_path: str) -> None:
with open(csv_path, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
rows = list(reader)
with open(json_path, "w", encoding="utf-8") as out:
json.dump(rows, out, indent=2, ensure_ascii=False)
csv_to_json("users.csv", "users.json")
Two important details:
- newline="" — required when opening CSV for reading. Without it, the csv module mishandles embedded newlines in quoted fields.
- encoding="utf-8" — required for non-ASCII characters. If the source CSV is Windows-1252 or Latin-1 (common for files exported from older Excel versions), you'll get UnicodeDecodeError.
Type coercion is manual with the stdlib. By default everything is a string. To convert numbers:
def coerce_value(v: str):
try:
return int(v)
except ValueError:
try:
return float(v)
except ValueError:
return v
rows = [{k: coerce_value(v) for k, v in row.items()} for row in rows]
Method 2: pandas (best for type inference + nested output)
pandas does type inference automatically and gives you several JSON output shapes (records, index, columns).
pip install pandas
import pandas as pd
def csv_to_json(csv_path: str, json_path: str) -> None:
df = pd.read_csv(csv_path)
df.to_json(json_path, orient="records", indent=2)
csv_to_json("users.csv", "users.json")
Output shapes:
# orient="records": [{col: val}, {col: val}, ...] — most common
# orient="columns": {col: {row_idx: val, ...}, ...}
# orient="index": {row_idx: {col: val, ...}, ...}
# orient="split": {"columns": [...], "index": [...], "data": [[...]]}
# orient="values": [[...], [...], ...]
pandas auto-detects column types: integers stay integers, floats stay floats, dates can be parsed with parse_dates=. Use pandas when you need real types (not strings) in the JSON output.
The downside: pandas is heavy (~30MB install) and slow to import. For one-shot scripts on small files, the stdlib is faster.
Method 3: ChangeThisFile API (for unpredictable inputs)
If you're processing CSVs from third parties — different exporters, different delimiters, different encodings — the API absorbs the inconsistency. Get a free API key.
import requests
API_KEY = "sk_test_your_key_here"
def csv_to_json(csv_path: str, json_path: str) -> None:
with open(csv_path, "rb") as f:
response = requests.post(
"https://changethisfile.com/v1/convert",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"source": "csv", "target": "json"},
timeout=30,
)
response.raise_for_status()
with open(json_path, "wb") as out:
out.write(response.content)
csv_to_json("messy_export.csv", "clean.json")
The API auto-detects delimiters (comma, semicolon, tab, pipe), handles UTF-8 / Windows-1252 / Latin-1 encoding, and produces well-formed JSON even when the input CSV has minor errors (extra blank rows, inconsistent quoting). For batch jobs of user uploads, this saves a lot of error-handling code.
When to use each
| Approach | Best for | Tradeoff |
|---|---|---|
| csv + json (stdlib) | One-off scripts, predictable CSV, no deps | Manual type coercion |
| pandas | Data analysis, type-aware output, multiple shapes | Heavy dependency, slow import |
| ChangeThisFile API | User uploads, varied formats, batch jobs | Per-call cost, network call |
Common pitfalls
- Excel exports use Windows-1252 by default. Open with encoding="cp1252" or save as UTF-8 in Excel before exporting.
- Excel adds a BOM to UTF-8 CSV. Use encoding="utf-8-sig" to handle the byte-order mark transparently.
- Numbers like "007" get parsed as 7. If you need to preserve leading zeros (zip codes, IDs), turn off type inference: pandas.read_csv("file.csv", dtype=str).
- Nested objects are not native to CSV. If your CSV has dotted column names like "address.city", you have to manually un-flatten:
def unflatten(row):
out = {}
for k, v in row.items():
keys = k.split(".")
d = out
for kk in keys[:-1]:
d = d.setdefault(kk, {})
d[keys[-1]] = v
return out
For your own clean CSVs, the stdlib is the right tool. For data analysis with type inference, pandas. For accepting CSV uploads from users where you can't predict the input quality, the API absorbs the variability so your code stays simple. Free tier gives 1,000 conversions/month.