CSV-to-JSON is one of the most common transformations in data work — exporting from spreadsheets, prepping API payloads, seeding NoSQL databases. Python has it built-in. The hard parts are not the conversion itself but the messy realities of CSV: inconsistent delimiters, encoding mismatches, quoted fields with embedded commas, and type inference (is "42" a number or a string?).

Method 1: Built-in csv + json (zero dependencies)

The standard library handles CSV cleanly. csv.DictReader gives you list-of-dict, json.dumps gives you the output.

import csv
import json

def csv_to_json(csv_path: str, json_path: str) -> None:
    with open(csv_path, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        rows = list(reader)
    with open(json_path, "w", encoding="utf-8") as out:
        json.dump(rows, out, indent=2, ensure_ascii=False)

csv_to_json("users.csv", "users.json")

Two important details:

  • newline="" — required when opening CSV for reading. Without it, the csv module mishandles embedded newlines in quoted fields.
  • encoding="utf-8" — required for non-ASCII characters. If the source CSV is Windows-1252 or Latin-1 (common for files exported from older Excel versions), you'll get UnicodeDecodeError.

Type coercion is manual with the stdlib. By default everything is a string. To convert numbers:

def coerce_value(v: str):
    try:
        return int(v)
    except ValueError:
        try:
            return float(v)
        except ValueError:
            return v

rows = [{k: coerce_value(v) for k, v in row.items()} for row in rows]

Method 2: pandas (best for type inference + nested output)

pandas does type inference automatically and gives you several JSON output shapes (records, index, columns).

pip install pandas
import pandas as pd

def csv_to_json(csv_path: str, json_path: str) -> None:
    df = pd.read_csv(csv_path)
    df.to_json(json_path, orient="records", indent=2)

csv_to_json("users.csv", "users.json")

Output shapes:

# orient="records": [{col: val}, {col: val}, ...]  — most common
# orient="columns": {col: {row_idx: val, ...}, ...}
# orient="index":   {row_idx: {col: val, ...}, ...}
# orient="split":   {"columns": [...], "index": [...], "data": [[...]]}
# orient="values":  [[...], [...], ...]

pandas auto-detects column types: integers stay integers, floats stay floats, dates can be parsed with parse_dates=. Use pandas when you need real types (not strings) in the JSON output.

The downside: pandas is heavy (~30MB install) and slow to import. For one-shot scripts on small files, the stdlib is faster.

Method 3: ChangeThisFile API (for unpredictable inputs)

If you're processing CSVs from third parties — different exporters, different delimiters, different encodings — the API absorbs the inconsistency. Get a free API key.

import requests

API_KEY = "sk_test_your_key_here"

def csv_to_json(csv_path: str, json_path: str) -> None:
    with open(csv_path, "rb") as f:
        response = requests.post(
            "https://changethisfile.com/v1/convert",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"source": "csv", "target": "json"},
            timeout=30,
        )
    response.raise_for_status()
    with open(json_path, "wb") as out:
        out.write(response.content)

csv_to_json("messy_export.csv", "clean.json")

The API auto-detects delimiters (comma, semicolon, tab, pipe), handles UTF-8 / Windows-1252 / Latin-1 encoding, and produces well-formed JSON even when the input CSV has minor errors (extra blank rows, inconsistent quoting). For batch jobs of user uploads, this saves a lot of error-handling code.

When to use each

ApproachBest forTradeoff
csv + json (stdlib)One-off scripts, predictable CSV, no depsManual type coercion
pandasData analysis, type-aware output, multiple shapesHeavy dependency, slow import
ChangeThisFile APIUser uploads, varied formats, batch jobsPer-call cost, network call

Common pitfalls

  • Excel exports use Windows-1252 by default. Open with encoding="cp1252" or save as UTF-8 in Excel before exporting.
  • Excel adds a BOM to UTF-8 CSV. Use encoding="utf-8-sig" to handle the byte-order mark transparently.
  • Numbers like "007" get parsed as 7. If you need to preserve leading zeros (zip codes, IDs), turn off type inference: pandas.read_csv("file.csv", dtype=str).
  • Nested objects are not native to CSV. If your CSV has dotted column names like "address.city", you have to manually un-flatten:
def unflatten(row):
    out = {}
    for k, v in row.items():
        keys = k.split(".")
        d = out
        for kk in keys[:-1]:
            d = d.setdefault(kk, {})
        d[keys[-1]] = v
    return out

For your own clean CSVs, the stdlib is the right tool. For data analysis with type inference, pandas. For accepting CSV uploads from users where you can't predict the input quality, the API absorbs the variability so your code stays simple. Free tier gives 1,000 conversions/month.