Every data interchange decision eventually comes down to the same question: CSV, JSON, or XML? The answer depends on exactly three things: whether your data is flat or nested, whether you need type information, and who (or what) is going to read the file.

Each format represents a fundamentally different philosophy about data. CSV says "data is a table." JSON says "data is a tree of typed values." XML says "data is a tree of annotated, validated elements." There's no universally correct choice, but there's almost always a clearly wrong one for any given situation. Choosing the wrong format creates conversion headaches, parsing bugs, and integration friction that compounds over time.

CSV: Deceptively Simple, Surprisingly Broken

CSV (Comma-Separated Values) looks like the simplest format imaginable: rows of data, columns separated by commas, done. But CSV has no formal standard. RFC 4180 is the closest thing, and it was published in 2005 — decades after CSV was already everywhere. Most real-world CSV files don't fully comply with it.

The core problems are all edge cases that become daily annoyances:

  • Delimiter ambiguity. Commas inside field values must be quoted, but not all producers quote them. European locales use commas as decimal separators (3,14 not 3.14), so many European tools use semicolons as the CSV delimiter instead. Some files use tabs (TSV). You often can't tell which delimiter a file uses without inspecting it.
  • No type information. Every value in CSV is a string. The number 007 becomes 7 when Excel interprets it as numeric. ZIP codes, phone numbers, and part numbers with leading zeros are routinely destroyed by spreadsheet software that "helpfully" detects types. Dates are even worse: is 01/02/03 January 2, February 1, or 2001-02-03?
  • No nesting. CSV is strictly two-dimensional. If your data has hierarchy (a customer with multiple orders, each with multiple items), you either flatten it with repeating columns, use multiple CSV files with foreign keys, or give up and use JSON.
  • Encoding chaos. CSV files have no standard way to declare their character encoding. UTF-8 is common now, but legacy files in Windows-1252, Latin-1, or Shift-JIS are everywhere. Excel on Windows defaults to the system's locale encoding, not UTF-8 — unless you add a UTF-8 BOM (byte order mark) at the start of the file, which some parsers treat as data.

When CSV Is the Right Choice

Despite its problems, CSV is the right choice when: your data is genuinely tabular (rows and columns, no nesting), you need to import into or export from a spreadsheet, you're working with machine learning tools that expect flat datasets, or you need maximum compatibility with legacy systems. CSV is also the fastest format to parse at scale — streaming line-by-line through a 10GB CSV file uses almost no memory, while a 10GB JSON file needs a streaming parser or won't fit in memory at all.

If you're exchanging data between systems and it's flat, CSV to JSON conversion adds structure without losing information. Going the other direction — JSON to CSV — works only if the JSON is already flat or you accept losing nested structure through flattening.

JSON: The Web's Data Format

JSON (JavaScript Object Notation) emerged from JavaScript but is now language-agnostic and ubiquitous. Every modern programming language has a JSON parser in its standard library. Every REST API speaks JSON. Most NoSQL databases store JSON natively.

JSON has six types: string, number, boolean, null, array, and object (key-value map). This small type system is both JSON's strength and its limitation. Numbers exist (unlike CSV), but there's no distinction between integer and float. Booleans exist. But there's no date type — this is JSON's eternal problem. Dates are typically serialized as ISO 8601 strings ("2026-03-19T14:30:00Z"), but there's no standard for this. Some APIs use Unix timestamps. Some use American date strings. You have to know what the producer intended.

What JSON Does Well

Nested structures. JSON naturally represents hierarchical data: a customer object containing an array of order objects, each containing an array of item objects. This maps directly to how most applications model data internally, which is why JSON dominates API responses — the data structure in the API response matches the data structure in memory.

Readability. Formatted JSON is easy to read and hand-edit. Every developer can glance at a JSON file and understand its structure immediately, which matters for config files and data debugging.

Ubiquity. You don't need special libraries. JSON.parse() in JavaScript, json.loads() in Python, json.Unmarshal() in Go — every language has native JSON support.

JSON's Rough Edges

No comments. Standard JSON does not allow comments. This is a deliberate design decision by Douglas Crockford, but it makes JSON config files harder to document. JSON5 and JSONC add comments, but they're non-standard extensions that many parsers reject.

Strict syntax. A trailing comma after the last item in an array or object is a syntax error. A missing quote on a key is a syntax error. These trip up humans hand-editing JSON files constantly.

No schema enforcement. JSON has no built-in way to say "this field must be a string" or "this array must contain objects with these keys." JSON Schema exists as a separate specification, but it's opt-in and not universally used.

Large files are memory-hungry. JSON requires parsing the entire document to validate it. For gigabyte-scale data, this is a problem. JSONL (JSON Lines) solves this: one JSON object per line, parseable one line at a time. Converting CSV to JSON for large datasets? Consider JSONL instead of a single JSON array.

XML: Verbose, Powerful, Still Everywhere

XML (Extensible Markup Language) is the format everyone loves to hate and can't stop using. It's verbose — <name>John</name> versus JSON's "name": "John" — and its ecosystem is complex. But XML does things that neither CSV nor JSON can do natively, which is why it remains dominant in enterprise software, government systems, and document formats.

What Only XML Can Do

Schema validation. XSD (XML Schema Definition) lets you define exactly what a valid document looks like: which elements are required, what type each attribute must be, how many child elements are allowed, and what order they must appear in. A parser can validate an XML document against its schema and reject it before your application ever sees it. JSON Schema is getting there, but XSD is battle-tested across two decades of enterprise use.

Namespaces. XML namespaces let you combine elements from different vocabularies in one document without name collisions. An invoice XML can contain elements from your company's schema, the shipping provider's schema, and an industry-standard tax schema — all in one document, unambiguously identified. This is powerful for interoperability but confusing to work with.

XSLT transformation. XML has a built-in transformation language (XSLT) that can convert one XML structure into another, or into HTML, or into plain text. You can transform XML without writing application code — just supply an XSLT stylesheet. No equivalent exists for JSON or CSV.

Mixed content. XML can interleave text with child elements: <p>This is <b>bold</b> text</p>. JSON can't represent this naturally. This is why HTML (an XML-like language), DOCX (a ZIP of XML files), and SVG (vector graphics) all use XML — document markup requires mixed content.

Where XML Still Dominates

SOAP web services, RSS and Atom feeds, SVG graphics, XHTML, Microsoft Office documents (DOCX, XLSX, PPTX are all ZIP archives of XML files), Maven build files, Android layouts, SAML authentication, and most government data interchange standards. If you're integrating with banks, healthcare systems, or government APIs, you're probably dealing with XML.

When converting XML to JSON, be aware that XML attributes and elements don't map cleanly to JSON. An XML element like <price currency="USD">29.99</price> has both an attribute (currency) and text content (29.99). JSON has no concept of attributes, so converters must make arbitrary choices about how to represent them (usually as special keys like @currency or _attributes). Converting back from JSON to XML may not reproduce the original structure.

Brief Mentions: YAML and TOML

YAML is technically a JSON superset (every valid JSON document is valid YAML) that emphasizes human readability. It uses indentation instead of braces, supports comments, and allows multi-line strings without escaping. It's the standard for Kubernetes manifests, CI/CD pipelines (GitHub Actions, GitLab CI), Ansible playbooks, and many config files. The footgun: YAML is extremely sensitive to indentation, and certain values are auto-typed in surprising ways. Norway was famously parsed as false in YAML 1.1 because NO is a boolean. The string 3.10 becomes the float 3.1. Always quote strings that might be misinterpreted. Convert JSON to YAML when you need a human-editable config file; convert YAML to JSON when you need machine-parseable output.

TOML (Tom's Obvious Minimal Language) is a simpler alternative to YAML designed specifically for config files. It looks like an INI file with types: port = 8080, debug = true, [database] for sections. TOML supports dates natively (the only common format that does), doesn't have YAML's indentation footguns, and is the standard for Rust projects (Cargo.toml), Python (pyproject.toml), and Hugo. Use TOML for config files when YAML's complexity isn't needed.

Format Comparison Table

FeatureCSVJSONXMLYAMLTOML
Human ReadableFor simple dataYesVerbose but clearExcellentExcellent
Nested DataNoYesYesYesLimited
Data TypesNone (all strings)6 types (no date)Via schema11+ types (incl. date)Strong (incl. date)
Schema ValidationNoJSON Schema (opt-in)XSD/DTD (mature)No standardNo standard
CommentsNoNoYes (<!-- -->)Yes (#)Yes (#)
File SizeSmallestMediumLargestMediumMedium
Parse Speed (large files)FastestFastSlowestSlowFast
Streaming ParseNative (line by line)JSONL onlySAX/StAXNoNo
Best ForTabular data, spreadsheets, MLAPIs, config, NoSQLEnterprise, docs, validationConfig, DevOpsConfig files

What Gets Lost When Converting Between Formats

Format conversion is not lossless. Every conversion from a richer format to a simpler one discards information.

CSV to JSON (convert here): Gains structure. Each row becomes an object with column names as keys. No information is lost. The JSON is strictly more expressive than the CSV.

JSON to CSV (convert here): Loses nesting. Nested objects must be flattened (e.g., address.city becomes a column) or serialized as JSON strings inside cells. Arrays of varying length are especially problematic — you can't represent a customer with 2 orders and another with 17 orders in the same flat table without either wasting columns or repeating rows.

XML to JSON (convert here): Loses the attribute/element distinction, processing instructions, namespaces, and comments. If an XML element has both attributes and text content, the converter must invent a convention to represent both in JSON. There's no standard convention, so different tools produce different JSON structures from the same XML.

JSON to XML (convert here): Gains verbosity, loses simplicity. JSON arrays don't have a natural XML equivalent (converters typically wrap each item in an <item> element). JSON's null has no standard XML representation. But the structure is preserved — this direction is less lossy than XML-to-JSON.

CSV to XML or JSON to YAML: These are expansions — the target format can represent everything the source format can, plus more. They're essentially lossless.

How to Choose: A Decision Framework

Start with the simplest format that handles your data structure:

  1. Is your data flat (rows and columns with no nesting)? Use CSV. It's the fastest to parse, the smallest, and universally supported by spreadsheets, databases, and data tools. Convert to XLSX if you need to share with non-technical users.
  2. Does your data have nesting or mixed types? Use JSON. If you're building an API, use JSON. If you're storing documents in a NoSQL database, use JSON. If you're exchanging data between modern web services, use JSON. Full stop.
  3. Do you need formal validation, namespaces, or mixed content? Use XML. If you're building enterprise integrations, healthcare data exchange (HL7 FHIR), financial reporting (XBRL), or document markup, XML's power is worth its verbosity.
  4. Is this a config file for a human to edit? Use TOML for simple configs, YAML for complex ones. Never use JSON for human-edited config files — the lack of comments alone is disqualifying.

The format wars are over: they all won, in different niches. CSV owns tabular data and spreadsheet interchange. JSON owns APIs and web application data. XML owns enterprise integration and document markup. YAML owns DevOps configuration. TOML is carving out a niche in application config.

The practical question is usually not "which format should I invent from scratch" but "I have data in format A and need it in format B." When that happens, understand what you're gaining and losing in the conversion. CSV-to-JSON only adds structure. JSON-to-CSV loses nesting. XML-to-JSON loses attributes. As long as you know what's being discarded, you can make an informed choice about whether the conversion is acceptable for your use case.