Skip to content

Why Everything Uses JSON Now

  • lesson
  • data-formats
  • sgml
  • xml
  • json
  • yaml
  • protocol-buffers
  • messagepack
  • serialization-history
  • l1 ---# Why Everything Uses JSON Now (And Why It Shouldn't)

Topics: data formats, SGML, XML, JSON, YAML, Protocol Buffers, MessagePack, serialization history Level: L1 (Foundations) Time: 45–60 minutes Prerequisites: None


The Mission

Your CI pipeline fails. The error: cannot unmarshal number into Go struct field .version of type string. Your YAML config says version: 1.20 but the parser sees float 1.2 — the trailing zero vanished. You switch to JSON: "version": 1.20 — same problem. JSON numbers are floats too. You add quotes: "version": "1.20". It works. You've just discovered that data formats have opinions about your data.

Every API returns JSON. Every config is YAML or JSON. 20 years ago, it was all XML. Before that, nobody agreed on anything. How did JSON take over? What did it replace? What are its real limitations? And when should you use something else?

Quick taste of what's coming: Douglas Crockford didn't invent JSON — he noticed that JavaScript object literals were already a valid data format and wrote it down. The HTTP Referer header is a permanent typo from 1996. YAML was originally "Yet Another Markup Language" before being retronymed to "YAML Ain't Markup Language." And the 404 room at CERN? That story is completely made up.


The Pre-History: Everyone Makes Up Their Own Format

Before standardized data formats, every system invented its own way to represent structured data:

# /etc/passwd (1971) — colon-delimited
root:x:0:0:root:/root:/bin/bash

# CSV (1972) — comma-delimited, no real standard
name,age,city
"Smith, John",42,"New York"

# INI files (1980s) — sections and key-value pairs
[server]
host = 0.0.0.0
port = 8080

# Fixed-width (mainframes) — column positions matter
SMITH   JOHN    42  NEW YORK

Each format had one use case and broke when you needed anything more complex. Nested data? Lists inside objects? Schema validation? None of these formats could do it.


SGML and XML: The Enterprise Takes Over (1986–2005)

SGML (Standard Generalized Markup Language, 1986) was the first attempt at a universal document/data format. It was incredibly powerful and incredibly complex. HTML is a simplified subset of SGML.

XML (eXtensible Markup Language, 1998) simplified SGML into something practical:

<?xml version="1.0" encoding="UTF-8"?>
<user>
    <name>Alice</name>
    <age>30</age>
    <roles>
        <role>admin</role>
        <role>developer</role>
    </roles>
    <address>
        <city>Portland</city>
        <state>OR</state>
    </address>
</user>

XML was everywhere in the 2000s: SOAP web services, RSS feeds, configuration files (Java's Spring, Maven, Ant), Microsoft Office formats (.docx is a zip of XML files), SVG graphics, Android layouts.

What XML got right

  • Self-describing (tags tell you what the data means)
  • Namespaces (multiple schemas in one document)
  • Schemas (XSD validates structure before processing)
  • Ecosystem (XSLT for transformation, XPath for querying, XQuery for databases)

What killed XML

  • Verbosity. The closing tags double the size. A JSON response is often 30-50% smaller.
  • Complexity. SOAP, WSDL, XSD, XSLT, namespaces — the tooling stack was enormous.
  • Parsing. XML parsers are complex (SAX, DOM, StAX). JSON parsing is trivial.
  • Web developers hated it. JavaScript couldn't easily produce or consume XML. AJAX was originally "Asynchronous JavaScript and XML" but everyone switched to JSON.

Trivia: The term "AJAX" was coined by Jesse James Garrett in 2005. Despite the "X" standing for XML, the technique worked better with JSON. Within a few years, "AJAX" meant "asynchronous JavaScript" and the XML was quietly dropped.


JSON: The Accidental Standard (2001–Present)

JSON (JavaScript Object Notation) was "discovered" (not invented) by Douglas Crockford around 2001. He noticed that JavaScript's object literal syntax was already a valid data format:

{
    "name": "Alice",
    "age": 30,
    "roles": ["admin", "developer"],
    "address": {
        "city": "Portland",
        "state": "OR"
    }
}

Compare the same data in XML (above) vs JSON: fewer lines, no closing tags, no attributes vs elements debate, no namespace headaches.

Why JSON won

  1. JavaScript native. JSON.parse() and JSON.stringify() — no library needed. The browser already speaks it.
  2. Simple spec. The entire JSON spec fits on one page. Six data types: string, number, boolean, null, array, object. That's it.
  3. Compact. No closing tags, no attributes. 30-50% smaller than equivalent XML.
  4. Universal. Every language has a JSON library. Python: json.loads(). Go: json.Unmarshal(). Ruby: JSON.parse(). There's nothing to learn.
  5. REST killed SOAP. RESTful APIs using JSON replaced SOAP web services using XML. REST was simpler, and JSON was its natural encoding.

Trivia: Douglas Crockford didn't invent JSON — he documented what JavaScript was already doing. The first JSON spec (RFC 4627, 2006) was descriptive, not prescriptive. Crockford later wrote "JavaScript: The Good Parts" (2008), famous for being thin — the book's point was that most of JavaScript should be avoided.

What JSON lacks

  • No comments. You cannot comment a JSON file. This is deliberate — Crockford wanted JSON to be a pure data interchange format, not a configuration language. It's the single biggest reason people use YAML for configuration.
  • No trailing commas. Adding an item to the end of a list requires modifying the previous line too (to add a comma). This creates noisy Git diffs.
  • No date type. Dates are strings — everyone encodes them differently ("2026-03-22", "2026-03-22T14:30:00Z", 1711108800).
  • No binary data. Binary must be base64-encoded (33% size increase).
  • Numbers are floats. JSON has one number type (IEEE 754 double). 64-bit integers lose precision: 9007199254740993 becomes 9007199254740992. This has caused real bugs in financial systems.

YAML: JSON's Superset for Humans (2001–Present)

YAML was designed to be a human-friendly data format. It IS a superset of JSON — any valid JSON is valid YAML.

# Look, I can have comments!
name: Alice
age: 30
roles:
  - admin
  - developer
address:
  city: Portland
  state: OR

YAML added: comments, multiline strings, anchors/aliases (DRY references), and indentation-based nesting (no braces). It became the default for Kubernetes, Ansible, Docker Compose, GitHub Actions, and most DevOps tools.

But YAML's design decisions created the footguns covered in the Why YAML Keeps Breaking Your Deploys lesson: implicit typing, the Norway problem, octal numbers, and the multiline string maze.


Beyond JSON: The Binary Formats

JSON and YAML are text formats — human-readable but not fast to parse or compact to transmit. When performance matters, binary formats take over:

Protocol Buffers (Google, 2008)

// user.proto — schema definition
message User {
    string name = 1;
    int32 age = 2;
    repeated string roles = 3;
    Address address = 4;
}

Protobuf is schema-first: define the structure, generate code in any language, serialize to compact binary. Used inside Google for virtually all inter-service communication.

JSON Protobuf
Human-readable Yes No (binary)
Schema required No Yes (.proto file)
Parsing speed Moderate Very fast
Size Baseline 3-10x smaller
Language support Universal Code generation (many languages)
Backwards compatibility Rename a field, nothing breaks Field numbers are the contract

MessagePack

Binary JSON — same data model, binary encoding. Drop-in replacement for JSON with 40-50% size reduction. No schema required. Used by Redis, Fluentd, and others.

Apache Avro

Schema-embedded binary format used heavily in data pipelines (Kafka, Hadoop). The schema travels with the data, which makes schema evolution easier.

When to use what

Use case Format Why
REST APIs JSON Universal, human-debuggable
Config files YAML Comments, human-editable
Microservice-to-microservice Protobuf / gRPC Speed, type safety, schema
Data pipelines Avro / Parquet Schema evolution, columnar storage
Logging JSON Structured, parseable by log aggregators
Caching (Redis) MessagePack Compact, fast, JSON-compatible
Browser → server JSON Native JavaScript support
Mobile → server Protobuf Bandwidth-efficient

The Format Timeline

1960s   Fixed-width records (mainframes)
1971    /etc/passwd (colon-delimited)
1972    CSV (never properly standardized)
1980s   INI files (Windows)
1986    SGML (enterprise, complex)
1996    HTML (simplified SGML for documents)
1998    XML (simplified SGML for data)
2000    SOAP + WSDL + XSD (enterprise web services)
2001    JSON "discovered" by Douglas Crockford
2001    YAML 1.0 ("Yet Another Markup Language")
2005    AJAX popularizes JSON in browsers
2006    JSON spec (RFC 4627)
2008    Protocol Buffers (Google, open-sourced)
2009    YAML 1.2 (stricter, but few parsers implement it)
2011    MessagePack 1.0
2014    Kubernetes chooses YAML
2015    HTTP/2 + gRPC makes Protobuf mainstream
2017    JSON standardized as RFC 8259 (ECMA-404)
2020s   TOML, CUE, HCL, Pkl emerge for specific niches

Flashcard Check

Q1: Why doesn't JSON support comments?

Deliberate design by Douglas Crockford. JSON is a data interchange format, not a configuration language. Comments are for humans editing files — JSON is for machines exchanging data.

Q2: What killed XML?

Verbosity (closing tags), complexity (SOAP/WSDL/XSD stack), and poor JavaScript integration. JSON was simpler, smaller, and native to the browser.

Q3: YAML is a superset of JSON. What does that mean?

Any valid JSON document is also valid YAML. You can paste JSON into a YAML file.

Q4: When should you use Protobuf instead of JSON?

Service-to-service communication where speed and size matter. Protobuf is 3-10x smaller and much faster to parse. Trade-off: not human-readable, requires schema.

Q5: JSON numbers are IEEE 754 doubles. Why does this matter?

64-bit integers lose precision. 9007199254740993 becomes 9007199254740992. Financial and ID systems that use large integers must transmit them as strings.


Exercises

Exercise 1: Size comparison (hands-on)

Take this data and encode it in JSON, YAML, XML, and a CSV approximation. Compare sizes.

Three users: Alice (admin, 30, Portland), Bob (user, 25, Seattle), Carol (admin, 35, Austin)

Exercise 2: The decision (think)

For each scenario, which format would you choose?

  1. Public REST API consumed by web browsers
  2. Configuration file edited by humans in vim
  3. High-throughput event stream (1M events/second)
  4. Log aggregation pipeline
  5. Data warehouse ingestion from multiple sources
Answers 1. **JSON** — universal, native to browsers, every client library supports it. 2. **YAML** (or TOML) — comments, human-readable, indentation-based. 3. **Protobuf or Avro** — binary, fast parsing, compact. JSON at 1M/sec wastes bandwidth. 4. **JSON** — structured logging tools (Loki, ELK, Splunk) all parse JSON natively. 5. **Avro or Parquet** — schema evolution for changing sources, columnar storage for analytics.

Cheat Sheet

Format Human-readable Comments Schema Binary Best for
JSON Yes No Optional (JSON Schema) No APIs, logs
YAML Yes Yes No No Config files
XML Yes Yes Yes (XSD) No Legacy, documents
Protobuf No N/A Required (.proto) Yes RPC, microservices
MessagePack No N/A No Yes Caching, compact JSON
Avro No N/A Embedded Yes Data pipelines
CSV Mostly No No No Tabular data export
TOML Yes Yes No No Simple config (Cargo, pyproject)

Takeaways

  1. JSON won because it's simple, not because it's best. Six data types, one-page spec, native to JavaScript. Simplicity beat XML's power.

  2. JSON is terrible for configuration. No comments. Use YAML (with caution) or TOML for human-edited config.

  3. Binary formats exist for a reason. Protobuf is 3-10x smaller and faster. If you're doing service-to-service at scale, JSON's human-readability isn't worth the overhead.

  4. Every format is a trade-off. Human-readable vs compact. Schema-required vs flexible. Universal vs optimized. Choose based on who reads it and how fast it needs to be.


  • Why YAML Keeps Breaking Your Deploys — YAML's specific footguns in depth
  • What Happens When You kubectl apply — where YAML becomes Kubernetes objects