Data

JSON: The Data Format That Runs the Internet

Published Mar 19, 2026 10 min read By ChangeThisFile Team

Quick Answer

JSON (JavaScript Object Notation) is a lightweight text-based data format with six types: string, number, boolean, null, array, and object. Defined by RFC 8259, it is the default format for web APIs, config files, and data interchange. It deliberately lacks comments, trailing commas, and a native date type.

JSON is the most widely used data interchange format on the planet. Every major programming language has a JSON parser in its standard library. Every REST API returns JSON. Every NoSQL database speaks it natively. It's so ubiquitous that most developers use it without ever reading the specification — which, at 16 pages, is one of the shortest RFCs you'll find.

But that simplicity is precisely what makes JSON powerful. Douglas Crockford didn't invent a new format — he discovered a subset of JavaScript that was already being used informally for data exchange, gave it a name, formalized the grammar, and published it at json.org in 2002. The format was small enough to fit on a business card. Twenty-four years later, it processes trillions of API calls daily.

This guide covers everything a developer needs to know about JSON: its type system, its deliberate omissions, its security history, and how to work around its limitations without switching formats.

History: From JavaScript Subset to RFC 8259

Douglas Crockford first specified JSON in 2001 while working at State Software. The idea was simple: JavaScript already had object literal syntax that could represent structured data. If you stripped out functions, undefined, and other non-data constructs, you got a clean data format that any language could parse.

Crockford registered json.org in 2002 and published the grammar on a single page. The format gained traction rapidly because it was smaller than XML, easier to parse, and directly usable in JavaScript with eval() (a practice that later proved catastrophic for security). IETF formalized JSON as RFC 4627 in 2006, updated it to RFC 7159 in 2014, and published the current definitive specification as RFC 8259 in 2017. Simultaneously, ECMA-404 standardized the same grammar.

A key philosophical decision: Crockford intentionally kept JSON minimal. When asked why JSON doesn't support comments, he said he removed them because people were using comments to embed parsing directives, which would have destroyed interoperability. JSON's lack of features is a feature.

The Six Data Types

JSON's entire type system fits in a short list:

Type	Example	Notes
`string`	`"hello world"`	Must use double quotes. Supports Unicode escapes (`\u0041`). No single quotes.
`number`	`42`, `3.14`, `-1e10`	No distinction between integer and float. No hex, octal, or Infinity/NaN.
`boolean`	`true`, `false`	Lowercase only. `True` and `FALSE` are syntax errors.
`null`	`null`	Represents absence of value. Not the same as missing key.
`array`	`[1, "two", null]`	Ordered, heterogeneous. Can contain any type including nested arrays.
`object`	`{"key": "value"}`	Unordered key-value pairs. Keys must be strings. Duplicate keys are technically allowed by the spec but discouraged — behavior is undefined.

That's it. Six types, no extensions, no custom types. This constraint is what makes JSON universally parseable — every language can map these six types to native equivalents.

The Number Precision Trap

JSON numbers have no size limit in the specification, but implementations do. JavaScript's Number type is a 64-bit IEEE 754 float, which can only represent integers exactly up to 2⁵³ - 1 (9,007,199,254,740,991). Larger integers — like Twitter's 64-bit snowflake IDs or many database primary keys — lose precision silently when parsed in JavaScript.

This is why many APIs return large IDs as strings: "id": "1234567890123456789" instead of "id": 1234567890123456789. Python and Java handle big integers natively, but if any consumer in the chain is JavaScript-based, string encoding is the safe choice.

Deliberate Omissions: Comments, Dates, Trailing Commas

JSON's most-requested features are the ones Crockford deliberately left out.

No Comments

Standard JSON has no comment syntax. Not //, not /* */, not #. Crockford removed comments from an early draft after observing that developers were using them to embed parser directives (like // @format: compact), which would have fractured the format into incompatible dialects.

Workarounds: JSON5 adds // and /* */ comments plus trailing commas, unquoted keys, and other human-friendly features. JSONC (JSON with Comments) adds only // and /* */ — VS Code uses JSONC for settings.json. Both are non-standard and rejected by strict parsers. If your config file needs comments, consider converting to YAML or TOML instead.

No Date Type

JSON has no native date or datetime type. The convention — not a standard, just a widespread convention — is ISO 8601 strings: "2026-03-19T14:30:00Z". But some APIs use Unix timestamps (integers), some use milliseconds since epoch, and some use locale-specific strings like "March 19, 2026". Every JSON consumer must know which convention the producer used.

TOML is the only common data format with native date and datetime types. If date handling is critical and you control both producer and consumer, converting from JSON to TOML gives you typed dates.

No Trailing Commas

This is valid JavaScript: [1, 2, 3,]. It is invalid JSON. The trailing comma after 3 causes a parse error. This trips up developers hand-editing JSON files constantly, especially when adding or removing items from arrays or objects. It also makes diffs noisier — adding a new last element requires modifying two lines (add comma to previous, add new element).

JSON5 allows trailing commas. Standard JSON never will. Many editors (VS Code, JetBrains) can be configured to strip trailing commas on save for JSON files.

Parsing Performance and Streaming

JSON parsing is fast. On modern hardware, a good parser (like simdjson) processes JSON at over 2 GB/s by exploiting SIMD instructions. Standard library parsers are slower but still handle typical API responses in microseconds.

The performance problem with JSON is not parse speed — it's memory. A JSON parser must typically load the entire document into memory to parse it, because the format has no line-level structure. A 5GB JSON array is one big string that must be read completely before any record can be accessed.

Scenario	Standard JSON	JSONL
100 records, 50KB	Parse entire file in memory. Fast.	Same speed. No advantage.
10M records, 5GB	Needs 5GB+ RAM to parse. May OOM.	Process line by line. ~100KB RAM.
Append new record	Parse entire file, add to array, re-serialize.	Append one line. O(1).
Random access to record N	Parse entire file, index into array.	Seek to line N (if pre-indexed).

For large datasets, converting JSON arrays to NDJSON/JSONL is often the single most impactful optimization. BigQuery, Elasticsearch, and most data pipeline tools accept JSONL natively.

simdjson and Modern Parsers

simdjson (2019) demonstrated that JSON parsing could be radically faster by using SIMD (Single Instruction, Multiple Data) CPU instructions to process 64 bytes of JSON simultaneously. Benchmarks show simdjson parsing at 2-4 GB/s versus ~300 MB/s for traditional parsers — a 6-10x improvement. The library is available in C++, Rust, Go, Python, and Node.js.

For most applications, the standard library parser is fast enough. simdjson matters when you're parsing JSON at infrastructure scale: log aggregation, real-time analytics, high-frequency API gateways.

JSON Schema: Validation Without XML's Verbosity

JSON Schema is a vocabulary for annotating and validating JSON documents. It defines the expected structure: which fields are required, what types they must be, what ranges are valid, and how nested objects should look. It's defined in its own JSON document (a schema is itself JSON), which makes it machine-readable and embeddable.

A minimal schema for a user object:

{
  "type": "object",
  "required": ["name", "email"],
  "properties": {
    "name": { "type": "string", "minLength": 1 },
    "email": { "type": "string", "format": "email" },
    "age": { "type": "integer", "minimum": 0 }
  }
}

JSON Schema is widely supported: it's the basis for OpenAPI (Swagger) API specifications, VS Code's settings validation, and many form-building libraries. But unlike XML's XSD, JSON Schema adoption is opt-in. Most JSON in the wild has no schema, which means validation happens in application code rather than at the parsing layer.

The latest specification is JSON Schema 2020-12, which added vocabulary support and better conditional validation. Libraries exist for every major language: ajv (JavaScript), jsonschema (Python), everit-org/json-schema (Java).

Security: JSON Hijacking and eval() Injection

JSON's first decade was marked by a serious security vulnerability. Because JSON is valid JavaScript, early web applications parsed API responses with eval(response). This executed any JavaScript embedded in the response, enabling code injection attacks.

The eval() pattern was replaced by JSON.parse() (introduced in ES5, 2009), which only parses data and rejects anything executable. Modern applications should never use eval() to parse JSON — this is a solved problem, but legacy code still exists.

JSON hijacking was a separate attack (2007-2011) where attackers could steal JSON array responses by overriding JavaScript's Array constructor. The fix: never return a bare JSON array from an API. Instead, wrap it in an object ({"data": [...]}) or prefix it with an unparseable string (Google's APIs used )]}' \n as a prefix). Modern browsers patched the Array constructor vulnerability, but the "never return bare arrays" practice persists.

Modern Security Concerns

Prototype pollution is the current JSON security risk in JavaScript. When a JSON object contains keys like "__proto__" or "constructor", naive parsing and object merging can modify the prototypes of all JavaScript objects, leading to property injection across the entire application. Libraries like lodash.merge were vulnerable to this for years. Defenses: use Object.create(null) for parsed data, validate keys before merging, use libraries with built-in prototype pollution protection.

Denial of service via deeply nested JSON is another concern. A payload like [[[[[...]]]]] nested thousands of levels deep can overflow the parser's stack. Most production JSON parsers have configurable depth limits — set them.

JSON Variants: JSON5, JSONC, JSONL, JSON-LD

The strictness of standard JSON spawned several variants, each relaxing specific constraints:

Variant	Adds	Used By
JSON5	Comments, trailing commas, unquoted keys, hex numbers, single quotes, multiline strings	Build tools, config files
JSONC	Comments only (`//` and `/* */`)	VS Code settings, tsconfig.json
JSONL/NDJSON	One JSON object per line, no wrapping array	BigQuery, Elasticsearch, data pipelines
JSON-LD	Linked Data context (`@context`, `@id`)	Schema.org, SEO structured data, knowledge graphs
GeoJSON	Standardized geographic data structures	Maps, GIS, spatial databases

Importantly, JSONL and JSON-LD are fully valid JSON — they're conventions on top of standard JSON, not syntax extensions. JSON5 and JSONC are not valid JSON and require their own parsers. Converting JSONC to standard JSON strips the comments; converting JSON to JSON5 just adds the option to use them.

JSON in Practice: Where It Dominates

JSON is the default format for:

REST APIs — virtually every public API returns JSON. The Content-Type: application/json header is the most common on the web.
NoSQL databases — MongoDB, CouchDB, DynamoDB, and Firestore all store and query JSON (or BSON, a binary JSON variant) natively.
Configuration — package.json (npm), tsconfig.json (TypeScript), composer.json (PHP), appsettings.json (.NET). The lack of comments is widely criticized for config use.
Data exchange — export data from one system, import into another. JSON preserves types (unlike CSV) without XML's verbosity.
Web storage — localStorage and sessionStorage serialize data as JSON strings. IndexedDB stores JavaScript objects (JSON-compatible).

Where JSON is the wrong choice: large tabular datasets (use CSV or Parquet), human-edited config files (use YAML or TOML), document markup (use XML or HTML), and high-performance binary protocols (use Protocol Buffers or MessagePack).

Converting JSON to Other Formats

JSON sits at the center of most format conversion workflows because of its type system and universal support. Common conversions and what happens to your data:

Conversion	What Happens	Data Loss?
JSON to CSV	Nested objects flattened to dot notation. Arrays become repeated rows or serialized strings.	Yes — nesting, types
JSON to YAML	Direct mapping. YAML can represent everything JSON can, plus comments.	No
JSON to XML	Objects become elements, arrays become repeated elements. No attributes in output.	Structural (no attribute distinction)
JSON to TOML	Works for flat/shallow objects. Deep nesting becomes verbose TOML tables.	No (if structure fits)
JSON to NDJSON	Top-level array split into one object per line.	No
JSON to XLSX	Flat arrays become spreadsheet rows. Nested data needs flattening.	Yes — nesting
JSON to MessagePack	Binary encoding of same structure. ~30-50% smaller files.	No

JSON won the data interchange format war through radical simplicity. Six types, no extensions, no parsing ambiguity. Every feature request it rejected — comments, dates, trailing commas — would have added complexity and fragmented implementations. The result is a format that every language, database, and API can produce and consume without configuration.

The practical lesson: use JSON as your default format for structured data exchange. When you hit its limitations — no comments for config files, no streaming for large datasets, no schema for validation — reach for YAML, JSONL, or JSON Schema respectively. But start with JSON. It's the common language your entire stack already speaks.

Key Takeaways

JSON has exactly six types: string, number, boolean, null, array, and object. No dates, no integers vs. floats, no undefined.
RFC 8259 (2017) is the definitive specification. It's 16 pages — short enough to read in one sitting.
No comments and no trailing commas are deliberate design decisions, not oversights. Use JSON5 or JSONC if you need them.
JavaScript loses precision on integers larger than 2^53. APIs should return large IDs as strings, not numbers.
JSON Schema provides validation without XML's complexity but requires opt-in adoption.
For large datasets (100MB+), convert JSON arrays to JSONL for streaming and constant-memory processing.
Never use eval() to parse JSON. Use JSON.parse() or a dedicated library with depth limits and prototype pollution protection.

Frequently Asked Questions

Why doesn't JSON support comments?

Douglas Crockford deliberately removed comments from the JSON specification. He observed that developers were using comments to embed parsing directives, which would have created incompatible JSON dialects. The decision is controversial but has kept JSON universally parseable for 24 years. If you need comments, use JSONC (VS Code's format), JSON5, or convert to YAML or TOML.

What's the difference between JSON and JSON5?

JSON5 extends standard JSON with features from ECMAScript 5.1: single-line and multi-line comments, trailing commas, unquoted object keys, single-quoted strings, hexadecimal numbers, and multiline strings. JSON5 is not valid JSON — a standard JSON parser will reject JSON5 files. Use JSON5 for config files that humans edit; use standard JSON for API responses and data interchange.

How should I represent dates in JSON?

The most widely adopted convention is ISO 8601 strings: "2026-03-19T14:30:00Z". This is used by JavaScript's Date.toJSON(), most databases, and the majority of public APIs. Avoid Unix timestamps (ambiguous between seconds and milliseconds) and locale-specific strings (unparseable without knowing the locale). Whatever you choose, document it in your API specification.

Can JSON handle binary data?

Not natively. JSON is a text format — it can't embed raw bytes. The standard workaround is Base64 encoding: convert binary data to a Base64 string and include it as a JSON string value. This increases size by approximately 33%. For binary-heavy payloads, consider MessagePack or Protocol Buffers instead of JSON, or transfer binary data separately and reference it by URL in the JSON.

Is the order of keys in a JSON object guaranteed?

No. RFC 8259 states that JSON objects are unordered collections of key-value pairs. Implementations may preserve insertion order (JavaScript does since ES2015, Python dicts since 3.7), but you should never rely on key order for correctness. If order matters, use an array of objects instead of an object.

What happens with duplicate keys in JSON?

RFC 8259 says object keys 'SHOULD be unique' but does not require it. Behavior with duplicate keys is implementation-defined: most parsers silently use the last value, but some use the first. Some parsers throw errors. Never produce JSON with duplicate keys, and don't rely on any specific behavior when consuming JSON that might contain them.

How big can a JSON file be before I should switch to JSONL?

There's no hard limit, but as a practical guideline: if your JSON file exceeds 100MB or contains more than 100,000 records in a top-level array, JSONL will be significantly easier to work with. JSONL lets you process records one at a time with constant memory, append new records without re-serializing the entire file, and parallelize processing across workers.

Why do some APIs prefix JSON responses with )]}' or while(1);?

This is a defense against JSON hijacking, an attack from 2007-2011 where malicious websites could steal JSON array responses via script injection. The prefix makes the response invalid JavaScript, preventing eval()-based attacks. Google's APIs use )]}', Facebook used for(;;);. Modern browsers patched the underlying vulnerability, but many APIs keep the prefix for defense in depth.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting