jsonpost
NDJSONStreamingBig Data

NDJSON and JSON Lines: Streaming JSON at Scale

Why one JSON object per line beats a giant array for streaming, logs, and big data, with Node and Python examples for reading, writing, and converting NDJSON.

JSONPost··6 min read
NDJSON and JSON Lines

A single JSON array works beautifully until your dataset stops fitting in memory. The moment you have a million records, a long-running export, or an append-only log, that one giant array becomes a liability: you cannot parse it until the last byte arrives, you cannot append to it without rewriting the whole file, and a single truncated download leaves you with nothing usable. NDJSON solves all three problems by storing one JSON value per line. This post explains how it works, where it shows up, and how to read, write, and convert it in real code. When you need a quick conversion, the NDJSON Converter handles arrays to lines and back.

What NDJSON and JSON Lines are

NDJSON stands for Newline Delimited JSON. JSON Lines, often abbreviated JSONL, is the same idea with a slightly different spec name. In practice the terms are interchangeable: each line of the file is a complete, valid JSON value, and a newline character separates one record from the next. There is no enclosing array and no commas between records.

Here is a small NDJSON file with three records:

{"id": 1, "event": "login", "ts": "2026-04-15T09:00:00Z"}
{"id": 2, "event": "purchase", "amount": 49.99}
{"id": 3, "event": "logout", "ts": "2026-04-15T09:14:32Z"}

Notice what is absent: no leading [, no trailing ], no comma after each object. Each line stands alone. You could delete the middle line, append a new one, or stop reading halfway through and still have valid records in hand.

Why one object per line beats a giant array

A standard JSON array forces an all-or-nothing parse. The parser must see the closing bracket before it can hand you a valid structure, so the whole document has to be buffered in memory. With NDJSON you read one line, parse it, process it, and discard it before moving on. Memory stays flat whether the file holds a thousand records or a billion.

Appending is the other big win. Adding a record to a JSON array means parsing it, pushing an element, and re-serializing the entire file. Appending to NDJSON is a single file write: open in append mode, write the serialized object, write a newline, done. That is exactly what you want for logs and event streams.

array file:   read all -> parse all -> mutate -> write all
ndjson file:  open append -> write one line -> close

Truncation tolerance rounds it out. If a network transfer or a crashed process cuts an NDJSON file mid-stream, every complete line before the break is still valid. A truncated JSON array is unparseable garbage.

Where you will see it

NDJSON is everywhere in data infrastructure. Application and server logs are typically one JSON object per line so tools like Logstash and Vector can tail and ship them. Google BigQuery and Amazon Athena both accept newline-delimited JSON for bulk loads. The OpenAI batch and fine-tuning APIs expect JSONL files where each line is a training example or request. Machine learning datasets ship as JSONL because frameworks can stream them without loading everything at once. Even kubectl and docker emit NDJSON when you ask for streaming output.

Reading NDJSON line by line in Node

The wrong way is to read the whole file and split on newlines, because that defeats the streaming benefit. Use a readline interface over a file stream so each line is parsed as it arrives.

import fs from "node:fs";
import readline from "node:readline";

const rl = readline.createInterface({
  input: fs.createReadStream("events.ndjson"),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (line.trim() === "") continue; // skip blank lines
  const record = JSON.parse(line);
  console.log(record.event, record.id);
}

Memory usage stays constant no matter how large events.ndjson is, because only one line lives in memory at a time. The crlfDelay option handles Windows line endings, and the blank-line guard protects against a trailing newline at the end of the file.

Reading NDJSON line by line in Python

Python makes this even shorter because iterating a file object yields one line at a time.

import json

with open("events.ndjson", "r", encoding="utf-8") as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        record = json.loads(line)
        print(record["event"], record["id"])

Again, the file is never fully loaded. This pattern scales to multi-gigabyte files on a laptop. For writing, the inverse is just as simple:

import json

records = [
    {"id": 1, "event": "login"},
    {"id": 2, "event": "purchase", "amount": 49.99},
]

with open("out.ndjson", "w", encoding="utf-8") as f:
    for record in records:
        f.write(json.dumps(record) + "\n")

The key detail is json.dumps per record plus an explicit newline, never a single json.dump of the whole list.

Converting between arrays and NDJSON

You will constantly need to move between the two formats. An API might return a JSON array that you want to feed into BigQuery as NDJSON, or you might have an NDJSON log you want to load into a tool that expects an array.

Array to NDJSON in Node:

import fs from "node:fs";

const arr = JSON.parse(fs.readFileSync("data.json", "utf-8"));
const out = arr.map((item) => JSON.stringify(item)).join("\n");
fs.writeFileSync("data.ndjson", out + "\n");

NDJSON back to an array in Node:

import fs from "node:fs";

const lines = fs
  .readFileSync("data.ndjson", "utf-8")
  .split("\n")
  .filter((line) => line.trim() !== "");
const arr = lines.map((line) => JSON.parse(line));
fs.writeFileSync("data.json", JSON.stringify(arr, null, 2));

On the command line, the jq tool is excellent for this. Convert an array into NDJSON with the compact flag:

jq -c '.[]' data.json > data.ndjson

And fold NDJSON lines back into a single array with slurp mode:

jq -s '.' data.ndjson > data.json

For one-off conversions without writing code, paste your data into the NDJSON Converter and switch direction with a click.

Common pitfalls

A few mistakes show up again and again. First, do not wrap the records in array brackets. NDJSON has no [ or ] at the file level, and adding them makes the file invalid for line-by-line readers. Second, every line must be a complete, self-contained JSON value, so you cannot pretty-print records across multiple lines. A formatted object that spans several lines breaks the one-record-per-line rule. Keep each record compact.

WRONG (pretty-printed object spans lines):
{
  "id": 1
}

RIGHT (one record per line):
{"id": 1}

Third, watch the trailing newline. Most tools tolerate a final newline and many expect it, but a few choke on a blank final line, which is why the readers above skip empty lines. Fourth, mind the encoding: stick to UTF-8 and avoid a byte-order mark, since a leading BOM will make the first line fail to parse.

Conclusion

NDJSON is the format you reach for when JSON needs to stream, append, or scale past the size of memory. One valid JSON object per line gives you constant-memory parsing, cheap appends, and graceful handling of partial files, which is exactly why logs, BigQuery, OpenAI, and ML pipelines all standardized on it. Keep each line compact and valid, skip the array brackets, and the format will carry you from a few records to billions. When you need to flip between an array and newline-delimited form, the NDJSON Converter does it instantly, and you can explore the rest of the JSONPost tools for validating, transforming, and analyzing your data.

Keep reading