Why Everything Uses JSON Now
- lesson
- data-formats
- sgml
- xml
- json
- yaml
- protocol-buffers
- messagepack
- serialization-history
- l1 ---# Why Everything Uses JSON Now (And Why It Shouldn't)
Topics: data formats, SGML, XML, JSON, YAML, Protocol Buffers, MessagePack, serialization history Level: L1 (Foundations) Time: 45–60 minutes Prerequisites: None
The Mission¶
Your CI pipeline fails. The error: cannot unmarshal number into Go struct field .version
of type string. Your YAML config says version: 1.20 but the parser sees float 1.2 —
the trailing zero vanished. You switch to JSON: "version": 1.20 — same problem. JSON
numbers are floats too. You add quotes: "version": "1.20". It works. You've just
discovered that data formats have opinions about your data.
Every API returns JSON. Every config is YAML or JSON. 20 years ago, it was all XML. Before that, nobody agreed on anything. How did JSON take over? What did it replace? What are its real limitations? And when should you use something else?
Quick taste of what's coming: Douglas Crockford didn't invent JSON — he noticed that JavaScript object literals were already a valid data format and wrote it down. The HTTP
Refererheader is a permanent typo from 1996. YAML was originally "Yet Another Markup Language" before being retronymed to "YAML Ain't Markup Language." And the404room at CERN? That story is completely made up.
The Pre-History: Everyone Makes Up Their Own Format¶
Before standardized data formats, every system invented its own way to represent structured data:
# /etc/passwd (1971) — colon-delimited
root:x:0:0:root:/root:/bin/bash
# CSV (1972) — comma-delimited, no real standard
name,age,city
"Smith, John",42,"New York"
# INI files (1980s) — sections and key-value pairs
[server]
host = 0.0.0.0
port = 8080
# Fixed-width (mainframes) — column positions matter
SMITH JOHN 42 NEW YORK
Each format had one use case and broke when you needed anything more complex. Nested data? Lists inside objects? Schema validation? None of these formats could do it.
SGML and XML: The Enterprise Takes Over (1986–2005)¶
SGML (Standard Generalized Markup Language, 1986) was the first attempt at a universal document/data format. It was incredibly powerful and incredibly complex. HTML is a simplified subset of SGML.
XML (eXtensible Markup Language, 1998) simplified SGML into something practical:
<?xml version="1.0" encoding="UTF-8"?>
<user>
<name>Alice</name>
<age>30</age>
<roles>
<role>admin</role>
<role>developer</role>
</roles>
<address>
<city>Portland</city>
<state>OR</state>
</address>
</user>
XML was everywhere in the 2000s: SOAP web services, RSS feeds, configuration files (Java's Spring, Maven, Ant), Microsoft Office formats (.docx is a zip of XML files), SVG graphics, Android layouts.
What XML got right¶
- Self-describing (tags tell you what the data means)
- Namespaces (multiple schemas in one document)
- Schemas (XSD validates structure before processing)
- Ecosystem (XSLT for transformation, XPath for querying, XQuery for databases)
What killed XML¶
- Verbosity. The closing tags double the size. A JSON response is often 30-50% smaller.
- Complexity. SOAP, WSDL, XSD, XSLT, namespaces — the tooling stack was enormous.
- Parsing. XML parsers are complex (SAX, DOM, StAX). JSON parsing is trivial.
- Web developers hated it. JavaScript couldn't easily produce or consume XML. AJAX was originally "Asynchronous JavaScript and XML" but everyone switched to JSON.
Trivia: The term "AJAX" was coined by Jesse James Garrett in 2005. Despite the "X" standing for XML, the technique worked better with JSON. Within a few years, "AJAX" meant "asynchronous JavaScript" and the XML was quietly dropped.
JSON: The Accidental Standard (2001–Present)¶
JSON (JavaScript Object Notation) was "discovered" (not invented) by Douglas Crockford around 2001. He noticed that JavaScript's object literal syntax was already a valid data format:
{
"name": "Alice",
"age": 30,
"roles": ["admin", "developer"],
"address": {
"city": "Portland",
"state": "OR"
}
}
Compare the same data in XML (above) vs JSON: fewer lines, no closing tags, no attributes vs elements debate, no namespace headaches.
Why JSON won¶
- JavaScript native.
JSON.parse()andJSON.stringify()— no library needed. The browser already speaks it. - Simple spec. The entire JSON spec fits on one page. Six data types: string, number, boolean, null, array, object. That's it.
- Compact. No closing tags, no attributes. 30-50% smaller than equivalent XML.
- Universal. Every language has a JSON library. Python:
json.loads(). Go:json.Unmarshal(). Ruby:JSON.parse(). There's nothing to learn. - REST killed SOAP. RESTful APIs using JSON replaced SOAP web services using XML. REST was simpler, and JSON was its natural encoding.
Trivia: Douglas Crockford didn't invent JSON — he documented what JavaScript was already doing. The first JSON spec (RFC 4627, 2006) was descriptive, not prescriptive. Crockford later wrote "JavaScript: The Good Parts" (2008), famous for being thin — the book's point was that most of JavaScript should be avoided.
What JSON lacks¶
- No comments. You cannot comment a JSON file. This is deliberate — Crockford wanted JSON to be a pure data interchange format, not a configuration language. It's the single biggest reason people use YAML for configuration.
- No trailing commas. Adding an item to the end of a list requires modifying the previous line too (to add a comma). This creates noisy Git diffs.
- No date type. Dates are strings — everyone encodes them differently
(
"2026-03-22","2026-03-22T14:30:00Z",1711108800). - No binary data. Binary must be base64-encoded (33% size increase).
- Numbers are floats. JSON has one number type (IEEE 754 double). 64-bit integers
lose precision:
9007199254740993becomes9007199254740992. This has caused real bugs in financial systems.
YAML: JSON's Superset for Humans (2001–Present)¶
YAML was designed to be a human-friendly data format. It IS a superset of JSON — any valid JSON is valid YAML.
# Look, I can have comments!
name: Alice
age: 30
roles:
- admin
- developer
address:
city: Portland
state: OR
YAML added: comments, multiline strings, anchors/aliases (DRY references), and indentation-based nesting (no braces). It became the default for Kubernetes, Ansible, Docker Compose, GitHub Actions, and most DevOps tools.
But YAML's design decisions created the footguns covered in the Why YAML Keeps Breaking Your Deploys lesson: implicit typing, the Norway problem, octal numbers, and the multiline string maze.
Beyond JSON: The Binary Formats¶
JSON and YAML are text formats — human-readable but not fast to parse or compact to transmit. When performance matters, binary formats take over:
Protocol Buffers (Google, 2008)¶
// user.proto — schema definition
message User {
string name = 1;
int32 age = 2;
repeated string roles = 3;
Address address = 4;
}
Protobuf is schema-first: define the structure, generate code in any language, serialize to compact binary. Used inside Google for virtually all inter-service communication.
| JSON | Protobuf | |
|---|---|---|
| Human-readable | Yes | No (binary) |
| Schema required | No | Yes (.proto file) |
| Parsing speed | Moderate | Very fast |
| Size | Baseline | 3-10x smaller |
| Language support | Universal | Code generation (many languages) |
| Backwards compatibility | Rename a field, nothing breaks | Field numbers are the contract |
MessagePack¶
Binary JSON — same data model, binary encoding. Drop-in replacement for JSON with 40-50% size reduction. No schema required. Used by Redis, Fluentd, and others.
Apache Avro¶
Schema-embedded binary format used heavily in data pipelines (Kafka, Hadoop). The schema travels with the data, which makes schema evolution easier.
When to use what¶
| Use case | Format | Why |
|---|---|---|
| REST APIs | JSON | Universal, human-debuggable |
| Config files | YAML | Comments, human-editable |
| Microservice-to-microservice | Protobuf / gRPC | Speed, type safety, schema |
| Data pipelines | Avro / Parquet | Schema evolution, columnar storage |
| Logging | JSON | Structured, parseable by log aggregators |
| Caching (Redis) | MessagePack | Compact, fast, JSON-compatible |
| Browser → server | JSON | Native JavaScript support |
| Mobile → server | Protobuf | Bandwidth-efficient |
The Format Timeline¶
1960s Fixed-width records (mainframes)
1971 /etc/passwd (colon-delimited)
1972 CSV (never properly standardized)
1980s INI files (Windows)
1986 SGML (enterprise, complex)
1996 HTML (simplified SGML for documents)
1998 XML (simplified SGML for data)
2000 SOAP + WSDL + XSD (enterprise web services)
2001 JSON "discovered" by Douglas Crockford
2001 YAML 1.0 ("Yet Another Markup Language")
2005 AJAX popularizes JSON in browsers
2006 JSON spec (RFC 4627)
2008 Protocol Buffers (Google, open-sourced)
2009 YAML 1.2 (stricter, but few parsers implement it)
2011 MessagePack 1.0
2014 Kubernetes chooses YAML
2015 HTTP/2 + gRPC makes Protobuf mainstream
2017 JSON standardized as RFC 8259 (ECMA-404)
2020s TOML, CUE, HCL, Pkl emerge for specific niches
Flashcard Check¶
Q1: Why doesn't JSON support comments?
Deliberate design by Douglas Crockford. JSON is a data interchange format, not a configuration language. Comments are for humans editing files — JSON is for machines exchanging data.
Q2: What killed XML?
Verbosity (closing tags), complexity (SOAP/WSDL/XSD stack), and poor JavaScript integration. JSON was simpler, smaller, and native to the browser.
Q3: YAML is a superset of JSON. What does that mean?
Any valid JSON document is also valid YAML. You can paste JSON into a YAML file.
Q4: When should you use Protobuf instead of JSON?
Service-to-service communication where speed and size matter. Protobuf is 3-10x smaller and much faster to parse. Trade-off: not human-readable, requires schema.
Q5: JSON numbers are IEEE 754 doubles. Why does this matter?
64-bit integers lose precision.
9007199254740993becomes9007199254740992. Financial and ID systems that use large integers must transmit them as strings.
Exercises¶
Exercise 1: Size comparison (hands-on)¶
Take this data and encode it in JSON, YAML, XML, and a CSV approximation. Compare sizes.
Exercise 2: The decision (think)¶
For each scenario, which format would you choose?
- Public REST API consumed by web browsers
- Configuration file edited by humans in vim
- High-throughput event stream (1M events/second)
- Log aggregation pipeline
- Data warehouse ingestion from multiple sources
Answers
1. **JSON** — universal, native to browsers, every client library supports it. 2. **YAML** (or TOML) — comments, human-readable, indentation-based. 3. **Protobuf or Avro** — binary, fast parsing, compact. JSON at 1M/sec wastes bandwidth. 4. **JSON** — structured logging tools (Loki, ELK, Splunk) all parse JSON natively. 5. **Avro or Parquet** — schema evolution for changing sources, columnar storage for analytics.Cheat Sheet¶
| Format | Human-readable | Comments | Schema | Binary | Best for |
|---|---|---|---|---|---|
| JSON | Yes | No | Optional (JSON Schema) | No | APIs, logs |
| YAML | Yes | Yes | No | No | Config files |
| XML | Yes | Yes | Yes (XSD) | No | Legacy, documents |
| Protobuf | No | N/A | Required (.proto) | Yes | RPC, microservices |
| MessagePack | No | N/A | No | Yes | Caching, compact JSON |
| Avro | No | N/A | Embedded | Yes | Data pipelines |
| CSV | Mostly | No | No | No | Tabular data export |
| TOML | Yes | Yes | No | No | Simple config (Cargo, pyproject) |
Takeaways¶
-
JSON won because it's simple, not because it's best. Six data types, one-page spec, native to JavaScript. Simplicity beat XML's power.
-
JSON is terrible for configuration. No comments. Use YAML (with caution) or TOML for human-edited config.
-
Binary formats exist for a reason. Protobuf is 3-10x smaller and faster. If you're doing service-to-service at scale, JSON's human-readability isn't worth the overhead.
-
Every format is a trade-off. Human-readable vs compact. Schema-required vs flexible. Universal vs optimized. Choose based on who reads it and how fast it needs to be.
Related Lessons¶
- Why YAML Keeps Breaking Your Deploys — YAML's specific footguns in depth
- What Happens When You
kubectl apply— where YAML becomes Kubernetes objects