Standard JSON has a scaling problem. A JSON array of 10 million records is one giant string that must be parsed entirely to access any single record. You can't stream it, you can't append to it without parsing first, and it can't be processed in parallel because you need the closing bracket to know the document is valid. JSONL solves all three problems by putting one JSON object per line.
The format is trivially simple: each line is a complete, valid JSON value (usually an object), terminated by a newline character (\n). No wrapping array, no commas between records. That's the entire specification. This simplicity makes JSONL the standard format for log files, data pipelines, machine learning datasets, and anywhere you need to process JSON at scale.
The Format: One JSON Object Per Line
{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "London"}
{"name": "Charlie", "age": 35, "city": "Tokyo"}Rules:
- Each line is a valid JSON value (objects are most common, but arrays and primitives are allowed).
- Lines are separated by
\n(newline). Some implementations accept\r\n. - No comma between lines. No wrapping array. No opening or closing brackets for the file.
- Each line must be self-contained — no JSON value spans multiple lines. This means JSON within a line must not contain literal newlines (use
\nescape within strings).
The equivalent standard JSON would be:
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Tokyo"}
]For 3 records, the difference is cosmetic. For 3 million records, it's the difference between streaming (constant memory) and loading everything into RAM.
JSONL vs NDJSON: Same Format, Different Names
JSONL (JSON Lines, jsonlines.org) and NDJSON (Newline Delimited JSON, ndjson.org) are the same format with different names. Both specify one JSON value per line, separated by newlines. The specifications are essentially identical.
| Name | File Extension | MIME Type | Spec URL |
|---|---|---|---|
| JSON Lines | .jsonl | application/jsonl | jsonlines.org |
| NDJSON | .ndjson | application/x-ndjson | ndjson.org |
In practice, .jsonl is the more common extension. BigQuery, OpenAI, and most data tools use "JSONL" as the term. Elasticsearch uses "NDJSON" for its bulk API. The terms are interchangeable — don't spend time debating which to use. Just pick one and be consistent.
Why JSONL Exists: The Three Problems It Solves
JSONL addresses three fundamental limitations of standard JSON arrays:
1. Streaming: Constant-Memory Processing
A standard JSON array must be fully parsed before any record can be accessed. A 5GB JSON file needs 5GB+ of RAM. JSONL files can be processed line by line — each line is independently parseable, so memory usage is proportional to the largest single record, not the file size.
# Processing a 5GB JSON array: needs ~10GB RAM
import json
with open('huge.json') as f:
data = json.load(f) # Loads entire file into memory
for record in data:
process(record)
# Processing a 5GB JSONL file: needs ~1MB RAM
with open('huge.jsonl') as f:
for line in f:
record = json.loads(line) # One record at a time
process(record)This is the same advantage CSV has over JSON for large files — line-oriented formats can be streamed. JSONL gives you streaming with JSON's type system.
2. Appending: O(1) Record Addition
Adding a record to a JSON array requires: read entire file, parse JSON, append to array, serialize entire array, write entire file. For a 1GB file, this takes seconds and requires 2GB+ of memory.
Adding a record to a JSONL file requires: open file in append mode, write one line. This is O(1) regardless of file size — it takes the same time whether the file has 100 records or 100 million. This makes JSONL ideal for log files and any application where records are continuously appended.
3. Parallelism: Split and Conquer
Because each line is independent, JSONL files can be split at any line boundary and processed in parallel. Unix tools like split can divide a JSONL file into chunks, and each chunk can be processed by a separate worker. Map-reduce workflows on JSONL files are trivial to implement.
Standard JSON arrays can't be split because the parser needs the entire array structure (opening bracket, commas, closing bracket) to validate the document. You'd need a streaming JSON parser that understands array boundaries, which is significantly more complex.
Who Uses JSONL: Adoption Across the Industry
| Platform/Tool | How JSONL Is Used |
|---|---|
| Google BigQuery | Native import/export format. bq load --source_format=NEWLINE_DELIMITED_JSON |
| Elasticsearch | Bulk API uses NDJSON: action+document pairs, one per line. |
| OpenAI | Fine-tuning datasets use JSONL. Each line is a training example. |
| Amazon Athena | Queries JSONL files in S3 directly. |
| Datadog / Splunk | Structured log ingestion via JSONL. |
| MongoDB | mongoexport outputs JSONL by default (--type=json is actually JSONL). |
| Hugging Face | Dataset format for ML training data. |
| jq | Processes JSONL natively with --slurp for array operations. |
The pattern: any system that processes large volumes of JSON data supports JSONL. It's the de facto standard for JSON-at-scale.
JSONL vs CSV: When to Use Each
Both JSONL and CSV are line-oriented, streamable formats. The choice depends on your data structure:
| Feature | JSONL | CSV |
|---|---|---|
| Types | String, number, boolean, null, array, object | Everything is a string |
| Nesting | Yes (objects, arrays within objects) | No (flat only) |
| Schema flexibility | Each line can have different keys | All rows must match header columns |
| File size | Larger (key names repeated per line) | Smaller (keys in header only) |
| Parse speed | Slower (JSON parsing per line) | Faster (simple delimiter split) |
| Spreadsheet support | None (requires conversion) | Universal |
| Streaming | Yes | Yes |
Use JSONL when your data has nesting, mixed types, or variable schemas (log events with different fields). Use CSV when your data is flat, consistent, and will be consumed by spreadsheets or legacy tools. Converting JSONL to CSV flattens the data; converting CSV to JSONL adds structure.
Working with JSONL: Tools and Patterns
JSONL's line-oriented structure makes it compatible with standard Unix tools:
# Count records
wc -l data.jsonl
# First 10 records
head -n 10 data.jsonl
# Filter with jq
jq 'select(.age > 30)' data.jsonl
# Extract a field
jq -r '.name' data.jsonl
# Sort by a field (slurp into array, sort, output as JSONL)
jq -s 'sort_by(.age)[]' data.jsonl
# Convert to CSV (using jq + standard tools)
jq -r '[.name, .age, .city] | @csv' data.jsonl
# Split into 1000-line chunks
split -l 1000 data.jsonl chunk_
# Parallel processing (GNU parallel)
cat data.jsonl | parallel --pipe -L 1000 'process_chunk.py'Every operation that works on text lines works on JSONL. This interoperability with the Unix tool ecosystem is a major practical advantage.
Converting Between JSON and JSONL
JSON array to JSONL: Extract each element from the array and write it as a separate line. In jq: jq -c '.[]' array.json > data.jsonl. The -c flag outputs compact JSON (one line per object). You can also convert JSON to NDJSON directly.
JSONL to JSON array: Read all lines, wrap in an array. In jq: jq -s '.' data.jsonl > array.json. The -s (slurp) flag reads all inputs into an array. Be aware: this loads the entire file into memory, which defeats the purpose for large files.
Other common conversions:
- CSV to JSONL — each row becomes a JSON object with column headers as keys
- JSONL to CSV — flatten objects to columns (nested fields need dot-notation)
- JSONL to XLSX — for spreadsheet consumption
- CSV to NDJSON — same as CSV to JSONL
JSONL Best Practices
- One JSON value per line, no exceptions. Never break a JSON object across lines. If an object contains a string with a newline, it must be escaped as
\nwithin the JSON, not a literal newline. - Use compact JSON. No pretty-printing, no indentation. Each line should be as short as possible. Readable JSON is for human-edited files; JSONL is for machine processing.
- Use consistent schemas. While JSONL allows different keys per line, consistent schemas make downstream processing much easier. Consider adding a
typeorversionfield if your schema evolves over time. - Use UTF-8 encoding. Like standard JSON, JSONL should always be UTF-8. No BOM needed (unlike CSV).
- Compress with gzip for storage and transfer. JSONL files compress exceptionally well (often 90%+ reduction) because JSON's repetitive key names and structural characters are highly compressible. Many tools accept
.jsonl.gzdirectly. - Validate before ingesting. A single malformed line in a JSONL file can break processing. Validate each line independently:
python -c "import json, sys; [json.loads(l) for l in sys.stdin]" < data.jsonl.
JSONL is the format you use when you need JSON's type system at CSV's scale. It's the natural answer to "I have a JSON array that's too large to fit in memory" — split it into one record per line and every scaling problem disappears. Streaming, appending, parallel processing, and Unix tool compatibility all come for free from the one-line-per-record constraint.
If you're building data pipelines, processing log files, preparing ML training data, or working with any dataset over 100MB in JSON format, convert to JSONL. The format change is trivial (remove the wrapping array, one object per line) and the operational benefits are immediate.