JSON is the most widely used data interchange format on the planet. Every major programming language has a JSON parser in its standard library. Every REST API returns JSON. Every NoSQL database speaks it natively. It's so ubiquitous that most developers use it without ever reading the specification — which, at 16 pages, is one of the shortest RFCs you'll find.
But that simplicity is precisely what makes JSON powerful. Douglas Crockford didn't invent a new format — he discovered a subset of JavaScript that was already being used informally for data exchange, gave it a name, formalized the grammar, and published it at json.org in 2002. The format was small enough to fit on a business card. Twenty-four years later, it processes trillions of API calls daily.
This guide covers everything a developer needs to know about JSON: its type system, its deliberate omissions, its security history, and how to work around its limitations without switching formats.
History: From JavaScript Subset to RFC 8259
Douglas Crockford first specified JSON in 2001 while working at State Software. The idea was simple: JavaScript already had object literal syntax that could represent structured data. If you stripped out functions, undefined, and other non-data constructs, you got a clean data format that any language could parse.
Crockford registered json.org in 2002 and published the grammar on a single page. The format gained traction rapidly because it was smaller than XML, easier to parse, and directly usable in JavaScript with eval() (a practice that later proved catastrophic for security). IETF formalized JSON as RFC 4627 in 2006, updated it to RFC 7159 in 2014, and published the current definitive specification as RFC 8259 in 2017. Simultaneously, ECMA-404 standardized the same grammar.
A key philosophical decision: Crockford intentionally kept JSON minimal. When asked why JSON doesn't support comments, he said he removed them because people were using comments to embed parsing directives, which would have destroyed interoperability. JSON's lack of features is a feature.
The Six Data Types
JSON's entire type system fits in a short list:
| Type | Example | Notes |
|---|---|---|
string | "hello world" | Must use double quotes. Supports Unicode escapes (\u0041). No single quotes. |
number | 42, 3.14, -1e10 | No distinction between integer and float. No hex, octal, or Infinity/NaN. |
boolean | true, false | Lowercase only. True and FALSE are syntax errors. |
null | null | Represents absence of value. Not the same as missing key. |
array | [1, "two", null] | Ordered, heterogeneous. Can contain any type including nested arrays. |
object | {"key": "value"} | Unordered key-value pairs. Keys must be strings. Duplicate keys are technically allowed by the spec but discouraged — behavior is undefined. |
That's it. Six types, no extensions, no custom types. This constraint is what makes JSON universally parseable — every language can map these six types to native equivalents.
The Number Precision Trap
JSON numbers have no size limit in the specification, but implementations do. JavaScript's Number type is a 64-bit IEEE 754 float, which can only represent integers exactly up to 253 - 1 (9,007,199,254,740,991). Larger integers — like Twitter's 64-bit snowflake IDs or many database primary keys — lose precision silently when parsed in JavaScript.
This is why many APIs return large IDs as strings: "id": "1234567890123456789" instead of "id": 1234567890123456789. Python and Java handle big integers natively, but if any consumer in the chain is JavaScript-based, string encoding is the safe choice.
Deliberate Omissions: Comments, Dates, Trailing Commas
JSON's most-requested features are the ones Crockford deliberately left out.
No Comments
Standard JSON has no comment syntax. Not //, not /* */, not #. Crockford removed comments from an early draft after observing that developers were using them to embed parser directives (like // @format: compact), which would have fractured the format into incompatible dialects.
Workarounds: JSON5 adds // and /* */ comments plus trailing commas, unquoted keys, and other human-friendly features. JSONC (JSON with Comments) adds only // and /* */ — VS Code uses JSONC for settings.json. Both are non-standard and rejected by strict parsers. If your config file needs comments, consider converting to YAML or TOML instead.
No Date Type
JSON has no native date or datetime type. The convention — not a standard, just a widespread convention — is ISO 8601 strings: "2026-03-19T14:30:00Z". But some APIs use Unix timestamps (integers), some use milliseconds since epoch, and some use locale-specific strings like "March 19, 2026". Every JSON consumer must know which convention the producer used.
TOML is the only common data format with native date and datetime types. If date handling is critical and you control both producer and consumer, converting from JSON to TOML gives you typed dates.
No Trailing Commas
This is valid JavaScript: [1, 2, 3,]. It is invalid JSON. The trailing comma after 3 causes a parse error. This trips up developers hand-editing JSON files constantly, especially when adding or removing items from arrays or objects. It also makes diffs noisier — adding a new last element requires modifying two lines (add comma to previous, add new element).
JSON5 allows trailing commas. Standard JSON never will. Many editors (VS Code, JetBrains) can be configured to strip trailing commas on save for JSON files.
Parsing Performance and Streaming
JSON parsing is fast. On modern hardware, a good parser (like simdjson) processes JSON at over 2 GB/s by exploiting SIMD instructions. Standard library parsers are slower but still handle typical API responses in microseconds.
The performance problem with JSON is not parse speed — it's memory. A JSON parser must typically load the entire document into memory to parse it, because the format has no line-level structure. A 5GB JSON array is one big string that must be read completely before any record can be accessed.
| Scenario | Standard JSON | JSONL |
|---|---|---|
| 100 records, 50KB | Parse entire file in memory. Fast. | Same speed. No advantage. |
| 10M records, 5GB | Needs 5GB+ RAM to parse. May OOM. | Process line by line. ~100KB RAM. |
| Append new record | Parse entire file, add to array, re-serialize. | Append one line. O(1). |
| Random access to record N | Parse entire file, index into array. | Seek to line N (if pre-indexed). |
For large datasets, converting JSON arrays to NDJSON/JSONL is often the single most impactful optimization. BigQuery, Elasticsearch, and most data pipeline tools accept JSONL natively.
simdjson and Modern Parsers
simdjson (2019) demonstrated that JSON parsing could be radically faster by using SIMD (Single Instruction, Multiple Data) CPU instructions to process 64 bytes of JSON simultaneously. Benchmarks show simdjson parsing at 2-4 GB/s versus ~300 MB/s for traditional parsers — a 6-10x improvement. The library is available in C++, Rust, Go, Python, and Node.js.
For most applications, the standard library parser is fast enough. simdjson matters when you're parsing JSON at infrastructure scale: log aggregation, real-time analytics, high-frequency API gateways.
JSON Schema: Validation Without XML's Verbosity
JSON Schema is a vocabulary for annotating and validating JSON documents. It defines the expected structure: which fields are required, what types they must be, what ranges are valid, and how nested objects should look. It's defined in its own JSON document (a schema is itself JSON), which makes it machine-readable and embeddable.
A minimal schema for a user object:
{
"type": "object",
"required": ["name", "email"],
"properties": {
"name": { "type": "string", "minLength": 1 },
"email": { "type": "string", "format": "email" },
"age": { "type": "integer", "minimum": 0 }
}
}JSON Schema is widely supported: it's the basis for OpenAPI (Swagger) API specifications, VS Code's settings validation, and many form-building libraries. But unlike XML's XSD, JSON Schema adoption is opt-in. Most JSON in the wild has no schema, which means validation happens in application code rather than at the parsing layer.
The latest specification is JSON Schema 2020-12, which added vocabulary support and better conditional validation. Libraries exist for every major language: ajv (JavaScript), jsonschema (Python), everit-org/json-schema (Java).
Security: JSON Hijacking and eval() Injection
JSON's first decade was marked by a serious security vulnerability. Because JSON is valid JavaScript, early web applications parsed API responses with eval(response). This executed any JavaScript embedded in the response, enabling code injection attacks.
The eval() pattern was replaced by JSON.parse() (introduced in ES5, 2009), which only parses data and rejects anything executable. Modern applications should never use eval() to parse JSON — this is a solved problem, but legacy code still exists.
JSON hijacking was a separate attack (2007-2011) where attackers could steal JSON array responses by overriding JavaScript's Array constructor. The fix: never return a bare JSON array from an API. Instead, wrap it in an object ({"data": [...]}) or prefix it with an unparseable string (Google's APIs used )]}' \n as a prefix). Modern browsers patched the Array constructor vulnerability, but the "never return bare arrays" practice persists.
Modern Security Concerns
Prototype pollution is the current JSON security risk in JavaScript. When a JSON object contains keys like "__proto__" or "constructor", naive parsing and object merging can modify the prototypes of all JavaScript objects, leading to property injection across the entire application. Libraries like lodash.merge were vulnerable to this for years. Defenses: use Object.create(null) for parsed data, validate keys before merging, use libraries with built-in prototype pollution protection.
Denial of service via deeply nested JSON is another concern. A payload like [[[[[...]]]]] nested thousands of levels deep can overflow the parser's stack. Most production JSON parsers have configurable depth limits — set them.
JSON Variants: JSON5, JSONC, JSONL, JSON-LD
The strictness of standard JSON spawned several variants, each relaxing specific constraints:
| Variant | Adds | Used By |
|---|---|---|
| JSON5 | Comments, trailing commas, unquoted keys, hex numbers, single quotes, multiline strings | Build tools, config files |
| JSONC | Comments only (// and /* */) | VS Code settings, tsconfig.json |
| JSONL/NDJSON | One JSON object per line, no wrapping array | BigQuery, Elasticsearch, data pipelines |
| JSON-LD | Linked Data context (@context, @id) | Schema.org, SEO structured data, knowledge graphs |
| GeoJSON | Standardized geographic data structures | Maps, GIS, spatial databases |
Importantly, JSONL and JSON-LD are fully valid JSON — they're conventions on top of standard JSON, not syntax extensions. JSON5 and JSONC are not valid JSON and require their own parsers. Converting JSONC to standard JSON strips the comments; converting JSON to JSON5 just adds the option to use them.
JSON in Practice: Where It Dominates
JSON is the default format for:
- REST APIs — virtually every public API returns JSON. The
Content-Type: application/jsonheader is the most common on the web. - NoSQL databases — MongoDB, CouchDB, DynamoDB, and Firestore all store and query JSON (or BSON, a binary JSON variant) natively.
- Configuration —
package.json(npm),tsconfig.json(TypeScript),composer.json(PHP),appsettings.json(.NET). The lack of comments is widely criticized for config use. - Data exchange — export data from one system, import into another. JSON preserves types (unlike CSV) without XML's verbosity.
- Web storage —
localStorageandsessionStorageserialize data as JSON strings. IndexedDB stores JavaScript objects (JSON-compatible).
Where JSON is the wrong choice: large tabular datasets (use CSV or Parquet), human-edited config files (use YAML or TOML), document markup (use XML or HTML), and high-performance binary protocols (use Protocol Buffers or MessagePack).
Converting JSON to Other Formats
JSON sits at the center of most format conversion workflows because of its type system and universal support. Common conversions and what happens to your data:
| Conversion | What Happens | Data Loss? |
|---|---|---|
| JSON to CSV | Nested objects flattened to dot notation. Arrays become repeated rows or serialized strings. | Yes — nesting, types |
| JSON to YAML | Direct mapping. YAML can represent everything JSON can, plus comments. | No |
| JSON to XML | Objects become elements, arrays become repeated elements. No attributes in output. | Structural (no attribute distinction) |
| JSON to TOML | Works for flat/shallow objects. Deep nesting becomes verbose TOML tables. | No (if structure fits) |
| JSON to NDJSON | Top-level array split into one object per line. | No |
| JSON to XLSX | Flat arrays become spreadsheet rows. Nested data needs flattening. | Yes — nesting |
| JSON to MessagePack | Binary encoding of same structure. ~30-50% smaller files. | No |
JSON won the data interchange format war through radical simplicity. Six types, no extensions, no parsing ambiguity. Every feature request it rejected — comments, dates, trailing commas — would have added complexity and fragmented implementations. The result is a format that every language, database, and API can produce and consume without configuration.
The practical lesson: use JSON as your default format for structured data exchange. When you hit its limitations — no comments for config files, no streaming for large datasets, no schema for validation — reach for YAML, JSONL, or JSON Schema respectively. But start with JSON. It's the common language your entire stack already speaks.