Data

JSONL and NDJSON: Streaming JSON for Large Datasets

Q: What's the difference between JSONL and NDJSON?

Nothing meaningful. JSONL (JSON Lines) and NDJSON (Newline Delimited JSON) are two names for the same format: one JSON value per line, separated by newlines. JSONL is the more commonly used term (BigQuery, OpenAI, Hugging Face all use it). Elasticsearch uses NDJSON. The file extensions .jsonl and .ndjson are interchangeable.

Q: Can JSONL lines contain different schemas?

Yes. Each line is an independent JSON value — there's no requirement that all lines have the same keys or structure. This makes JSONL useful for log files where different event types have different fields. However, for data pipeline and analytics use cases, consistent schemas make downstream processing much easier. Consider adding a 'type' field to distinguish different schemas.

Q: How does JSONL compression compare to CSV compression?

JSONL compresses extremely well — typically 85-95% size reduction with gzip — because JSON's structural characters and repeated key names are highly compressible. Compressed JSONL is often comparable in size to compressed CSV for the same data. A 1GB JSONL file might compress to 80-100MB. CSV compresses well too but starts smaller, so compressed CSV is usually still smaller than compressed JSONL.

Q: Can I use jq to process JSONL files?

Yes. jq processes JSONL natively — it reads one JSON value per line by default. Use jq 'select(.field == "value")' to filter, jq '.field' to extract, and jq -s to slurp all lines into an array for operations like sorting or aggregation. The -c flag outputs compact JSON (one line per output), preserving JSONL format in the output.

Q: What's the maximum line length in JSONL?

The JSONL specification doesn't define a maximum line length. In practice, lines are limited by the tools processing them. Most programming languages handle lines of any length. Some Unix tools (older versions of grep, some log processors) may have line length limits of 1MB or more. For very large individual records (multi-MB), test your processing pipeline to ensure all tools handle the line length.

Q: Should I use JSONL or Parquet for data pipelines?

Parquet is better for analytical workloads: it's columnar (fast for scanning specific fields), compressed by default, and supports predicate pushdown. JSONL is better for row-oriented processing, log ingestion, and cases where you need human-readable records. Use Parquet for data warehouse storage and analytics. Use JSONL for data ingestion, streaming, and interchange between systems.

Q: How do I validate a JSONL file?

Validate each line independently as JSON. In Python: 'for i, line in enumerate(open("file.jsonl")): json.loads(line)' — this throws an exception with the line number on the first invalid line. In jq: 'jq empty file.jsonl' validates and prints errors. For large files, validate a sample (head -n 1000 | jq empty) before processing the full file.

Q: Can I convert a large JSON array to JSONL without loading it all into memory?

Yes, with a streaming JSON parser. Python's ijson library can iterate over elements of a JSON array without loading the entire file. In jq: 'jq -c ".[]" large.json > output.jsonl' streams internally. For files too large for jq, use a streaming approach in your language of choice. You can also use ChangeThisFile to convert JSON to NDJSON for files within the upload limit.

Published Mar 19, 2026 7 min read By ChangeThisFile Team

Quick Answer

JSONL (JSON Lines) and NDJSON (Newline Delimited JSON) are the same format: one complete JSON object per line, separated by newlines, with no wrapping array. This enables streaming (process line by line with constant memory), appending (add records without re-serializing), and parallel processing of large JSON datasets.

Standard JSON has a scaling problem. A JSON array of 10 million records is one giant string that must be parsed entirely to access any single record. You can't stream it, you can't append to it without parsing first, and it can't be processed in parallel because you need the closing bracket to know the document is valid. JSONL solves all three problems by putting one JSON object per line.

The format is trivially simple: each line is a complete, valid JSON value (usually an object), terminated by a newline character (\n). No wrapping array, no commas between records. That's the entire specification. This simplicity makes JSONL the standard format for log files, data pipelines, machine learning datasets, and anywhere you need to process JSON at scale.

The Format: One JSON Object Per Line

{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "London"}
{"name": "Charlie", "age": 35, "city": "Tokyo"}

Rules:

Each line is a valid JSON value (objects are most common, but arrays and primitives are allowed).
Lines are separated by \n (newline). Some implementations accept \r\n.
No comma between lines. No wrapping array. No opening or closing brackets for the file.
Each line must be self-contained — no JSON value spans multiple lines. This means JSON within a line must not contain literal newlines (use \n escape within strings).

The equivalent standard JSON would be:

[
  {"name": "Alice", "age": 30, "city": "New York"},
  {"name": "Bob", "age": 25, "city": "London"},
  {"name": "Charlie", "age": 35, "city": "Tokyo"}
]

For 3 records, the difference is cosmetic. For 3 million records, it's the difference between streaming (constant memory) and loading everything into RAM.

JSONL vs NDJSON: Same Format, Different Names

JSONL (JSON Lines, jsonlines.org) and NDJSON (Newline Delimited JSON, ndjson.org) are the same format with different names. Both specify one JSON value per line, separated by newlines. The specifications are essentially identical.

Name	File Extension	MIME Type	Spec URL
JSON Lines	`.jsonl`	`application/jsonl`	jsonlines.org
NDJSON	`.ndjson`	`application/x-ndjson`	ndjson.org

In practice, .jsonl is the more common extension. BigQuery, OpenAI, and most data tools use "JSONL" as the term. Elasticsearch uses "NDJSON" for its bulk API. The terms are interchangeable — don't spend time debating which to use. Just pick one and be consistent.

Why JSONL Exists: The Three Problems It Solves

JSONL addresses three fundamental limitations of standard JSON arrays:

1. Streaming: Constant-Memory Processing

A standard JSON array must be fully parsed before any record can be accessed. A 5GB JSON file needs 5GB+ of RAM. JSONL files can be processed line by line — each line is independently parseable, so memory usage is proportional to the largest single record, not the file size.

# Processing a 5GB JSON array: needs ~10GB RAM
import json
with open('huge.json') as f:
    data = json.load(f)  # Loads entire file into memory
    for record in data:
        process(record)

# Processing a 5GB JSONL file: needs ~1MB RAM
with open('huge.jsonl') as f:
    for line in f:
        record = json.loads(line)  # One record at a time
        process(record)

This is the same advantage CSV has over JSON for large files — line-oriented formats can be streamed. JSONL gives you streaming with JSON's type system.

2. Appending: O(1) Record Addition

Adding a record to a JSON array requires: read entire file, parse JSON, append to array, serialize entire array, write entire file. For a 1GB file, this takes seconds and requires 2GB+ of memory.

Adding a record to a JSONL file requires: open file in append mode, write one line. This is O(1) regardless of file size — it takes the same time whether the file has 100 records or 100 million. This makes JSONL ideal for log files and any application where records are continuously appended.

3. Parallelism: Split and Conquer

Because each line is independent, JSONL files can be split at any line boundary and processed in parallel. Unix tools like split can divide a JSONL file into chunks, and each chunk can be processed by a separate worker. Map-reduce workflows on JSONL files are trivial to implement.

Standard JSON arrays can't be split because the parser needs the entire array structure (opening bracket, commas, closing bracket) to validate the document. You'd need a streaming JSON parser that understands array boundaries, which is significantly more complex.

Who Uses JSONL: Adoption Across the Industry

Platform/Tool	How JSONL Is Used
Google BigQuery	Native import/export format. `bq load --source_format=NEWLINE_DELIMITED_JSON`
Elasticsearch	Bulk API uses NDJSON: action+document pairs, one per line.
OpenAI	Fine-tuning datasets use JSONL. Each line is a training example.
Amazon Athena	Queries JSONL files in S3 directly.
Datadog / Splunk	Structured log ingestion via JSONL.
MongoDB	`mongoexport` outputs JSONL by default (`--type=json` is actually JSONL).
Hugging Face	Dataset format for ML training data.
jq	Processes JSONL natively with `--slurp` for array operations.

The pattern: any system that processes large volumes of JSON data supports JSONL. It's the de facto standard for JSON-at-scale.

JSONL vs CSV: When to Use Each

Both JSONL and CSV are line-oriented, streamable formats. The choice depends on your data structure:

Feature	JSONL	CSV
Types	String, number, boolean, null, array, object	Everything is a string
Nesting	Yes (objects, arrays within objects)	No (flat only)
Schema flexibility	Each line can have different keys	All rows must match header columns
File size	Larger (key names repeated per line)	Smaller (keys in header only)
Parse speed	Slower (JSON parsing per line)	Faster (simple delimiter split)
Spreadsheet support	None (requires conversion)	Universal
Streaming	Yes	Yes

Use JSONL when your data has nesting, mixed types, or variable schemas (log events with different fields). Use CSV when your data is flat, consistent, and will be consumed by spreadsheets or legacy tools. Converting JSONL to CSV flattens the data; converting CSV to JSONL adds structure.

Working with JSONL: Tools and Patterns

JSONL's line-oriented structure makes it compatible with standard Unix tools:

# Count records
wc -l data.jsonl

# First 10 records
head -n 10 data.jsonl

# Filter with jq
jq 'select(.age > 30)' data.jsonl

# Extract a field
jq -r '.name' data.jsonl

# Sort by a field (slurp into array, sort, output as JSONL)
jq -s 'sort_by(.age)[]' data.jsonl

# Convert to CSV (using jq + standard tools)
jq -r '[.name, .age, .city] | @csv' data.jsonl

# Split into 1000-line chunks
split -l 1000 data.jsonl chunk_

# Parallel processing (GNU parallel)
cat data.jsonl | parallel --pipe -L 1000 'process_chunk.py'

Every operation that works on text lines works on JSONL. This interoperability with the Unix tool ecosystem is a major practical advantage.

Converting Between JSON and JSONL

JSON array to JSONL: Extract each element from the array and write it as a separate line. In jq: jq -c '.[]' array.json > data.jsonl. The -c flag outputs compact JSON (one line per object). You can also convert JSON to NDJSON directly.

JSONL to JSON array: Read all lines, wrap in an array. In jq: jq -s '.' data.jsonl > array.json. The -s (slurp) flag reads all inputs into an array. Be aware: this loads the entire file into memory, which defeats the purpose for large files.

Other common conversions:

CSV to JSONL — each row becomes a JSON object with column headers as keys
JSONL to CSV — flatten objects to columns (nested fields need dot-notation)
JSONL to XLSX — for spreadsheet consumption
CSV to NDJSON — same as CSV to JSONL

JSONL Best Practices

One JSON value per line, no exceptions. Never break a JSON object across lines. If an object contains a string with a newline, it must be escaped as \n within the JSON, not a literal newline.
Use compact JSON. No pretty-printing, no indentation. Each line should be as short as possible. Readable JSON is for human-edited files; JSONL is for machine processing.
Use consistent schemas. While JSONL allows different keys per line, consistent schemas make downstream processing much easier. Consider adding a type or version field if your schema evolves over time.
Use UTF-8 encoding. Like standard JSON, JSONL should always be UTF-8. No BOM needed (unlike CSV).
Compress with gzip for storage and transfer. JSONL files compress exceptionally well (often 90%+ reduction) because JSON's repetitive key names and structural characters are highly compressible. Many tools accept .jsonl.gz directly.
Validate before ingesting. A single malformed line in a JSONL file can break processing. Validate each line independently: python -c "import json, sys; [json.loads(l) for l in sys.stdin]" < data.jsonl.

JSONL is the format you use when you need JSON's type system at CSV's scale. It's the natural answer to "I have a JSON array that's too large to fit in memory" — split it into one record per line and every scaling problem disappears. Streaming, appending, parallel processing, and Unix tool compatibility all come for free from the one-line-per-record constraint.

If you're building data pipelines, processing log files, preparing ML training data, or working with any dataset over 100MB in JSON format, convert to JSONL. The format change is trivial (remove the wrapping array, one object per line) and the operational benefits are immediate.

Key Takeaways

JSONL and NDJSON are the same format: one JSON object per line, no wrapping array. Use .jsonl as the file extension.
JSONL solves JSON's three scaling problems: streaming (constant memory), appending (O(1)), and parallelism (split at any line).
BigQuery, Elasticsearch, OpenAI, MongoDB, and most data pipeline tools support JSONL natively.
JSONL is compatible with Unix tools: wc, head, tail, split, grep all work on JSONL files because each record is one line.
JSONL is larger than CSV (key names repeated per line) but supports nesting, types, and variable schemas.
Always use compact JSON (no pretty-printing) and gzip compression for JSONL files in production.

Frequently Asked Questions

What's the difference between JSONL and NDJSON?

Nothing meaningful. JSONL (JSON Lines) and NDJSON (Newline Delimited JSON) are two names for the same format: one JSON value per line, separated by newlines. JSONL is the more commonly used term (BigQuery, OpenAI, Hugging Face all use it). Elasticsearch uses NDJSON. The file extensions .jsonl and .ndjson are interchangeable.

Can JSONL lines contain different schemas?

Yes. Each line is an independent JSON value — there's no requirement that all lines have the same keys or structure. This makes JSONL useful for log files where different event types have different fields. However, for data pipeline and analytics use cases, consistent schemas make downstream processing much easier. Consider adding a 'type' field to distinguish different schemas.

How does JSONL compression compare to CSV compression?

JSONL compresses extremely well — typically 85-95% size reduction with gzip — because JSON's structural characters and repeated key names are highly compressible. Compressed JSONL is often comparable in size to compressed CSV for the same data. A 1GB JSONL file might compress to 80-100MB. CSV compresses well too but starts smaller, so compressed CSV is usually still smaller than compressed JSONL.

Can I use jq to process JSONL files?

Yes. jq processes JSONL natively — it reads one JSON value per line by default. Use jq 'select(.field == "value")' to filter, jq '.field' to extract, and jq -s to slurp all lines into an array for operations like sorting or aggregation. The -c flag outputs compact JSON (one line per output), preserving JSONL format in the output.

What's the maximum line length in JSONL?

The JSONL specification doesn't define a maximum line length. In practice, lines are limited by the tools processing them. Most programming languages handle lines of any length. Some Unix tools (older versions of grep, some log processors) may have line length limits of 1MB or more. For very large individual records (multi-MB), test your processing pipeline to ensure all tools handle the line length.

Should I use JSONL or Parquet for data pipelines?

Parquet is better for analytical workloads: it's columnar (fast for scanning specific fields), compressed by default, and supports predicate pushdown. JSONL is better for row-oriented processing, log ingestion, and cases where you need human-readable records. Use Parquet for data warehouse storage and analytics. Use JSONL for data ingestion, streaming, and interchange between systems.

How do I validate a JSONL file?

Validate each line independently as JSON. In Python: 'for i, line in enumerate(open("file.jsonl")): json.loads(line)' — this throws an exception with the line number on the first invalid line. In jq: 'jq empty file.jsonl' validates and prints errors. For large files, validate a sample (head -n 1000 | jq empty) before processing the full file.

Can I convert a large JSON array to JSONL without loading it all into memory?

Yes, with a streaming JSON parser. Python's ijson library can iterate over elements of a JSON array without loading the entire file. In jq: 'jq -c ".[]" large.json > output.jsonl' streams internally. For files too large for jq, use a streaming approach in your language of choice. You can also use ChangeThisFile to convert JSON to NDJSON for files within the upload limit.

Ready to convert your files?

Use ChangeThisFile to convert between 600+ formats — free, fast, and private.

Start Converting