Data

CSV vs JSON vs XML: Choosing the Right Data Format

Published Mar 19, 2026 8 min read By ChangeThisFile Team

Quick Answer

Use CSV for flat, tabular data (spreadsheets, ML training sets, simple exports). Use JSON for structured data with nesting (APIs, config files, document databases). Use XML when you need formal schemas, namespaces, or are working with enterprise/legacy systems. CSV is the simplest but has no type information; JSON has types and nesting but no schema enforcement; XML has both but is verbose.

Quick Verdict

Best for Flat Data

CSV

Fastest to parse, smallest file size, and universally supported by spreadsheets, databases, and ML tools

Best for APIs

JSON

Native type system with nesting support — the universal language of REST APIs and NoSQL databases

Best for Enterprise

XML

Schema validation (XSD), namespaces, and mixed content make it indispensable for healthcare, finance, and government data exchange

Benchmarks

Test data: 1,000 customer records with nested address objects, array of order IDs, and mixed data types

format	file Size	parse Time	nesting Support	type System
CSV (flattened)	245 KB	12 ms	None (flattened)	None (all strings)
JSON	380 KB	18 ms	Full (native objects/arrays)	6 types (string, number, boolean, null, array, object)
XML	520 KB	45 ms	Full (elements + attributes)	Via XSD schema (any type definable)

Every data interchange decision eventually comes down to the same question: CSV, JSON, or XML? The answer depends on exactly three things: whether your data is flat or nested, whether you need type information, and who (or what) is going to read the file.

Each format represents a fundamentally different philosophy about data. CSV says "data is a table." JSON says "data is a tree of typed values." XML says "data is a tree of annotated, validated elements." There's no universally correct choice, but there's almost always a clearly wrong one for any given situation. Choosing the wrong format creates conversion headaches, parsing bugs, and integration friction that compounds over time.

CSV: Deceptively Simple, Surprisingly Broken

CSV (Comma-Separated Values) looks like the simplest format imaginable: rows of data, columns separated by commas, done. But CSV has no formal standard. RFC 4180 is the closest thing, and it was published in 2005 — decades after CSV was already everywhere. Most real-world CSV files don't fully comply with it.

The core problems are all edge cases that become daily annoyances:

Delimiter ambiguity. Commas inside field values must be quoted, but not all producers quote them. European locales use commas as decimal separators (3,14 not 3.14), so many European tools use semicolons as the CSV delimiter instead. Some files use tabs (TSV). You often can't tell which delimiter a file uses without inspecting it.
No type information. Every value in CSV is a string. The number 007 becomes 7 when Excel interprets it as numeric. ZIP codes, phone numbers, and part numbers with leading zeros are routinely destroyed by spreadsheet software that "helpfully" detects types. Dates are even worse: is 01/02/03 January 2, February 1, or 2001-02-03?
No nesting. CSV is strictly two-dimensional. If your data has hierarchy (a customer with multiple orders, each with multiple items), you either flatten it with repeating columns, use multiple CSV files with foreign keys, or give up and use JSON.
Encoding chaos. CSV files have no standard way to declare their character encoding. UTF-8 is common now, but legacy files in Windows-1252, Latin-1, or Shift-JIS are everywhere. Excel on Windows defaults to the system's locale encoding, not UTF-8 — unless you add a UTF-8 BOM (byte order mark) at the start of the file, which some parsers treat as data.

When CSV Is the Right Choice

Despite its problems, CSV is the right choice when: your data is genuinely tabular (rows and columns, no nesting), you need to import into or export from a spreadsheet, you're working with machine learning tools that expect flat datasets, or you need maximum compatibility with legacy systems. CSV is also the fastest format to parse at scale — streaming line-by-line through a 10GB CSV file uses almost no memory, while a 10GB JSON file needs a streaming parser or won't fit in memory at all.

If you're exchanging data between systems and it's flat, CSV to JSON conversion adds structure without losing information. Going the other direction — JSON to CSV — works only if the JSON is already flat or you accept losing nested structure through flattening.

JSON: The Web's Data Format

JSON (JavaScript Object Notation) emerged from JavaScript but is now language-agnostic and ubiquitous. Every modern programming language has a JSON parser in its standard library. Every REST API speaks JSON. Most NoSQL databases store JSON natively.

JSON has six types: string, number, boolean, null, array, and object (key-value map). This small type system is both JSON's strength and its limitation. Numbers exist (unlike CSV), but there's no distinction between integer and float. Booleans exist. But there's no date type — this is JSON's eternal problem. Dates are typically serialized as ISO 8601 strings ("2026-03-19T14:30:00Z"), but there's no standard for this. Some APIs use Unix timestamps. Some use American date strings. You have to know what the producer intended.

What JSON Does Well

Nested structures. JSON naturally represents hierarchical data: a customer object containing an array of order objects, each containing an array of item objects. This maps directly to how most applications model data internally, which is why JSON dominates API responses — the data structure in the API response matches the data structure in memory.

Readability. Formatted JSON is easy to read and hand-edit. Every developer can glance at a JSON file and understand its structure immediately, which matters for config files and data debugging.

Ubiquity. You don't need special libraries. JSON.parse() in JavaScript, json.loads() in Python, json.Unmarshal() in Go — every language has native JSON support.

JSON's Rough Edges

No comments. Standard JSON does not allow comments. This is a deliberate design decision by Douglas Crockford, but it makes JSON config files harder to document. JSON5 and JSONC add comments, but they're non-standard extensions that many parsers reject.

Strict syntax. A trailing comma after the last item in an array or object is a syntax error. A missing quote on a key is a syntax error. These trip up humans hand-editing JSON files constantly.

No schema enforcement. JSON has no built-in way to say "this field must be a string" or "this array must contain objects with these keys." JSON Schema exists as a separate specification, but it's opt-in and not universally used.

Large files are memory-hungry. JSON requires parsing the entire document to validate it. For gigabyte-scale data, this is a problem. JSONL (JSON Lines) solves this: one JSON object per line, parseable one line at a time. Converting CSV to JSON for large datasets? Consider JSONL instead of a single JSON array.

XML: Verbose, Powerful, Still Everywhere

XML (Extensible Markup Language) is the format everyone loves to hate and can't stop using. It's verbose — <name>John</name> versus JSON's "name": "John" — and its ecosystem is complex. But XML does things that neither CSV nor JSON can do natively, which is why it remains dominant in enterprise software, government systems, and document formats.

What Only XML Can Do

Schema validation. XSD (XML Schema Definition) lets you define exactly what a valid document looks like: which elements are required, what type each attribute must be, how many child elements are allowed, and what order they must appear in. A parser can validate an XML document against its schema and reject it before your application ever sees it. JSON Schema is getting there, but XSD is battle-tested across two decades of enterprise use.

Namespaces. XML namespaces let you combine elements from different vocabularies in one document without name collisions. An invoice XML can contain elements from your company's schema, the shipping provider's schema, and an industry-standard tax schema — all in one document, unambiguously identified. This is powerful for interoperability but confusing to work with.

XSLT transformation. XML has a built-in transformation language (XSLT) that can convert one XML structure into another, or into HTML, or into plain text. You can transform XML without writing application code — just supply an XSLT stylesheet. No equivalent exists for JSON or CSV.

Mixed content. XML can interleave text with child elements: <p>This is <b>bold</b> text</p>. JSON can't represent this naturally. This is why HTML (an XML-like language), DOCX (a ZIP of XML files), and SVG (vector graphics) all use XML — document markup requires mixed content.

Where XML Still Dominates

SOAP web services, RSS and Atom feeds, SVG graphics, XHTML, Microsoft Office documents (DOCX, XLSX, PPTX are all ZIP archives of XML files), Maven build files, Android layouts, SAML authentication, and most government data interchange standards. If you're integrating with banks, healthcare systems, or government APIs, you're probably dealing with XML.

When converting XML to JSON, be aware that XML attributes and elements don't map cleanly to JSON. An XML element like <price currency="USD">29.99</price> has both an attribute (currency) and text content (29.99). JSON has no concept of attributes, so converters must make arbitrary choices about how to represent them (usually as special keys like @currency or _attributes). Converting back from JSON to XML may not reproduce the original structure.

Brief Mentions: YAML and TOML

YAML is technically a JSON superset (every valid JSON document is valid YAML) that emphasizes human readability. It uses indentation instead of braces, supports comments, and allows multi-line strings without escaping. It's the standard for Kubernetes manifests, CI/CD pipelines (GitHub Actions, GitLab CI), Ansible playbooks, and many config files. The footgun: YAML is extremely sensitive to indentation, and certain values are auto-typed in surprising ways. Norway was famously parsed as false in YAML 1.1 because NO is a boolean. The string 3.10 becomes the float 3.1. Always quote strings that might be misinterpreted. Convert JSON to YAML when you need a human-editable config file; convert YAML to JSON when you need machine-parseable output.

TOML (Tom's Obvious Minimal Language) is a simpler alternative to YAML designed specifically for config files. It looks like an INI file with types: port = 8080, debug = true, [database] for sections. TOML supports dates natively (the only common format that does), doesn't have YAML's indentation footguns, and is the standard for Rust projects (Cargo.toml), Python (pyproject.toml), and Hugo. Use TOML for config files when YAML's complexity isn't needed.

Format Comparison Table

Feature	CSV	JSON	XML	YAML	TOML
Human Readable	For simple data	Yes	Verbose but clear	Excellent	Excellent
Nested Data	No	Yes	Yes	Yes	Limited
Data Types	None (all strings)	6 types (no date)	Via schema	11+ types (incl. date)	Strong (incl. date)
Schema Validation	No	JSON Schema (opt-in)	XSD/DTD (mature)	No standard	No standard
Comments	No	No	Yes (<!-- -->)	Yes (#)	Yes (#)
File Size	Smallest	Medium	Largest	Medium	Medium
Parse Speed (large files)	Fastest	Fast	Slowest	Slow	Fast
Streaming Parse	Native (line by line)	JSONL only	SAX/StAX	No	No
Best For	Tabular data, spreadsheets, ML	APIs, config, NoSQL	Enterprise, docs, validation	Config, DevOps	Config files

What Gets Lost When Converting Between Formats

Format conversion is not lossless. Every conversion from a richer format to a simpler one discards information.

CSV to JSON (convert here): Gains structure. Each row becomes an object with column names as keys. No information is lost. The JSON is strictly more expressive than the CSV.

JSON to CSV (convert here): Loses nesting. Nested objects must be flattened (e.g., address.city becomes a column) or serialized as JSON strings inside cells. Arrays of varying length are especially problematic — you can't represent a customer with 2 orders and another with 17 orders in the same flat table without either wasting columns or repeating rows.

XML to JSON (convert here): Loses the attribute/element distinction, processing instructions, namespaces, and comments. If an XML element has both attributes and text content, the converter must invent a convention to represent both in JSON. There's no standard convention, so different tools produce different JSON structures from the same XML.

JSON to XML (convert here): Gains verbosity, loses simplicity. JSON arrays don't have a natural XML equivalent (converters typically wrap each item in an <item> element). JSON's null has no standard XML representation. But the structure is preserved — this direction is less lossy than XML-to-JSON.

CSV to XML or JSON to YAML: These are expansions — the target format can represent everything the source format can, plus more. They're essentially lossless.

How to Choose: A Decision Framework

Start with the simplest format that handles your data structure:

Is your data flat (rows and columns with no nesting)? Use CSV. It's the fastest to parse, the smallest, and universally supported by spreadsheets, databases, and data tools. Convert to XLSX if you need to share with non-technical users.
Does your data have nesting or mixed types? Use JSON. If you're building an API, use JSON. If you're storing documents in a NoSQL database, use JSON. If you're exchanging data between modern web services, use JSON. Full stop.
Do you need formal validation, namespaces, or mixed content? Use XML. If you're building enterprise integrations, healthcare data exchange (HL7 FHIR), financial reporting (XBRL), or document markup, XML's power is worth its verbosity.
Is this a config file for a human to edit? Use TOML for simple configs, YAML for complex ones. Never use JSON for human-edited config files — the lack of comments alone is disqualifying.

The format wars are over: they all won, in different niches. CSV owns tabular data and spreadsheet interchange. JSON owns APIs and web application data. XML owns enterprise integration and document markup. YAML owns DevOps configuration. TOML is carving out a niche in application config.

The practical question is usually not "which format should I invent from scratch" but "I have data in format A and need it in format B." When that happens, understand what you're gaining and losing in the conversion. CSV-to-JSON only adds structure. JSON-to-CSV loses nesting. XML-to-JSON loses attributes. As long as you know what's being discarded, you can make an informed choice about whether the conversion is acceptable for your use case.

Key Takeaways

CSV is fastest and smallest but has no types, no nesting, and no real standard. Perfect for flat, tabular data.
JSON has types and nesting, making it ideal for APIs and structured data. Its main gaps: no date type, no comments, no built-in schema.
XML is verbose but uniquely powerful: schema validation, namespaces, mixed content, and XSLT transformation. Still dominant in enterprise.
Converting CSV to JSON is lossless (gains structure). Converting JSON to CSV is lossy (loses nesting). Converting XML to JSON loses attributes and namespaces.
YAML is great for human-edited config files but has indentation footguns and surprising auto-typing. TOML is simpler with native date support.
For human-edited configs, never use JSON (no comments). For APIs, always use JSON (universal support). For government/enterprise integration, expect XML.

Frequently Asked Questions

Should I use CSV or JSON for a data export feature?

Offer both. CSV for users who will open it in Excel or import it into a database. JSON for developers who will process it programmatically. If forced to choose one: CSV if your data is flat and your users are non-technical, JSON if your data has nesting or your users are developers.

Why does Excel mangle my CSV data?

Excel auto-detects data types and silently converts values. ZIP codes like '02134' become the number 2134. Long numbers lose precision. Dates are reformatted to the system locale. To prevent this, import the CSV via Data > From Text/CSV and manually set column types to 'Text' instead of double-clicking the file.

Is JSON faster to parse than XML?

Generally yes. JSON's simpler syntax means parsers have less work to do. Benchmarks typically show JSON parsing at 2-5x the speed of XML parsing for equivalent data. However, XML's SAX/StAX streaming parsers can process huge files with constant memory, while JSON requires JSONL format for true streaming. For files under 100MB, the speed difference rarely matters.

When should I use YAML instead of JSON?

When humans will read and edit the file directly. YAML supports comments, doesn't require quotes on most strings, and uses indentation instead of braces — all making it more human-friendly. Use YAML for config files, CI/CD pipelines, and infrastructure-as-code. Use JSON for API responses, data storage, and anything generated or consumed primarily by machines.

Can I convert XML to JSON without losing information?

Not perfectly. XML has features with no JSON equivalent: attributes vs. elements, namespaces, processing instructions, comments, and mixed content (text interleaved with child elements). Different converters handle these differently, and none preserve everything. If you need to round-trip (XML → JSON → XML), test carefully — the output XML will likely differ from the input.

What's JSONL and when should I use it?

JSONL (JSON Lines) puts one complete JSON object per line, with no wrapping array or commas between records. It's ideal for large datasets, log files, and streaming data because you can process it line by line without loading the entire file. If your JSON file is over 100MB or you're appending records over time, use JSONL instead of a single JSON array.

Is XML dying?

No. XML has lost mindshare for new API development (JSON won that battle), but it remains deeply embedded in enterprise systems, government standards, document formats (DOCX, SVG, RSS), authentication protocols (SAML), and financial/healthcare data exchange (XBRL, HL7 FHIR). XML isn't trendy, but it processes trillions of dollars in transactions daily.

How do I handle CSV files with different delimiters?

Most CSV parsers support configurable delimiters. In Python's csv module, pass delimiter=';' or delimiter='\t'. In JavaScript with PapaParser, it auto-detects delimiters. If you're converting between delimiter styles, use the TSV format explicitly for tab-separated data. You can convert between comma-separated and tab-separated using ChangeThisFile's CSV-to-TSV converter.

Compare Formats

CSV vs JSON CSV vs XML JSON vs XML

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting