Skip to content

Portal | Level: L1: Foundations | Topics: YAML, JSON & Config Formats, jq / JSON Processing | Domain: CLI Tools

YAML, JSON & Config Formats - Primer

Why This Matters

Every piece of infrastructure you touch — Kubernetes manifests, CI/CD pipelines, Terraform configs, Docker Compose files, application settings — is expressed in a structured data format. Misunderstanding the format is the root cause of a disproportionate number of deployment failures. A single misplaced space in YAML, a trailing comma in JSON, or an unquoted boolean can take down a production deploy. Knowing these formats deeply is not optional for DevOps work.


YAML

Name origin: YAML originally stood for "Yet Another Markup Language" (2001), but was retronymed to "YAML Ain't Markup Language" to emphasize that it is about data serialization, not document markup. It was created by Clark Evans, Ingy dot Net, and Oren Ben-Kiki.

YAML (YAML Ain't Markup Language) is the dominant config format in the DevOps ecosystem. Kubernetes, Ansible, GitHub Actions, Docker Compose, Helm, and most CI systems use it.

Scalars

Scalars are single values: strings, numbers, booleans, null.

# Strings — quotes optional unless ambiguous
name: nginx
version: "1.25"              # quote to force string (otherwise 1.25 is a float)
description: 'literal \n stays'   # single quotes: no escape processing
message: "line one\nline two"     # double quotes: \n becomes newline

# Numbers
replicas: 3                  # integer
cpu_limit: 1.5               # float
octal_value: 0o777           # YAML 1.2 octal (note the 0o prefix)
octal_legacy: 0777           # YAML 1.1 octal — this is 511 in decimal!

# Booleans — WARNING: yes/no/on/off are also booleans in YAML 1.1
enabled: true
debug: false

# Null
value: null
also_null: ~
empty_null:                  # implicit null (key with no value)

Sequences and Mappings

# Sequences (lists) — block and flow style
ports:
  - 80
  - 443
ports: [80, 443]              # flow style (inline)

# Sequence of mappings (very common in k8s)
containers:
  - name: app
    image: nginx:1.25
    ports:
      - containerPort: 80
  - name: sidecar
    image: envoy:1.28

# Mappings — nested arbitrarily
metadata:
  name: my-app
  labels: {app: my-app, tier: frontend}   # flow style for short mappings
spec:
  template:
    spec:
      containers:
        - name: app
          resources:
            limits:
              cpu: "500m"
              memory: "128Mi"

Multiline Strings

Four block scalar variants control newline handling:

# | (literal) — preserves newlines, adds one trailing newline
script: |
  #!/bin/bash
  echo "Hello"
# Result: "#!/bin/bash\necho \"Hello\"\n"

# |- (literal strip) — preserves newlines, NO trailing newline
script: |-
  #!/bin/bash
  echo "Hello"
# Result: "#!/bin/bash\necho \"Hello\""

# > (folded) — newlines become spaces (wraps paragraphs), one trailing newline
description: >
  This is a long description
  that wraps across lines.
# Result: "This is a long description that wraps across lines.\n"

# >- (folded strip) — folded, no trailing newline
# |+ (literal keep) — preserves ALL trailing newlines including blank lines

Anchors, Aliases, and Merge Keys

Anchors (&) define reusable nodes. Aliases (*) reference them. Merge keys (<<) merge mappings.

defaults: &app-defaults
  replicas: 3
  image: myapp:latest
  resources:
    limits:
      cpu: "500m"
      memory: "256Mi"

staging:
  <<: *app-defaults           # merge all keys from defaults
  replicas: 1                 # override specific values

production:
  <<: *app-defaults
  replicas: 5
  image: myapp:v2.1.0         # override image

# Scalar anchor
default_port: &port 8080
services:
  web:
    port: *port               # resolves to 8080

Merge key override order: explicit keys in the current mapping always win. When merging multiple sources (<<: [*a, *b]), the first mapping takes precedence for duplicate keys. See Footguns for edge cases.

Tags

Tags force a specific type: !!str NO (string, not boolean), !!str 8080 (string, not integer), !!float 3 (3.0, not integer). Useful for preventing type misinterpretation.

Document Separators

--- starts a new document within one file. ... optionally ends a document. Multiple documents in one file are common in Kubernetes (kubectl apply -f multi.yaml).

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-one
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-two

Complex Keys

YAML supports non-string keys, including sequences and mappings:

? [web, staging]
: deploy-web-staging.sh

# Integer keys
200: OK
404: Not Found

Flow Style vs Block Style

# Block style (idiomatic YAML — uses indentation)
metadata:
  labels:
    app: web
    tier: frontend

# Flow style (compact — looks like JSON)
metadata: {labels: {app: web, tier: frontend}}

# Mixed (common in practice)
metadata:
  labels: {app: web, tier: frontend}
  annotations:
    - description: "main app"

YAML 1.1 vs 1.2 Differences

Behavior YAML 1.1 YAML 1.2
Booleans yes/no/on/off/y/n are booleans Only true and false
Octals 0777 = 511 Must use 0o777
Sexagesimal 1:30 = 90 String "1:30"

Who uses what: PyYAML and SnakeYAML default to 1.1. Go yaml.v3, ruamel.yaml, and yq use 1.2.

Truthy Values Gotcha

In YAML 1.1, yes/Yes/YES/y/Y/on/On/ON/true/True/TRUE are all boolean true. The lowercase counterparts (no/n/off/false) are false. This is the single most common source of YAML bugs. Always quote strings that could be misinterpreted.


JSON

JSON (JavaScript Object Notation) is the universal data interchange format. Every language has a parser. Every API speaks it.

Structure and Data Types

{
  "string": "hello world",
  "integer": 42,
  "float": 3.14,
  "boolean": true,
  "null_value": null,
  "array": [1, 2, 3],
  "nested_object": {
    "key": "value",
    "deep": {
      "level": 3
    }
  }
}

Six types only: string, number, boolean, null, array, object. No dates, no binary, no comments.

Strict Syntax Rules

JSON is unforgiving. All of these are syntax errors: unquoted keys, single quotes, trailing commas, comments, hex literals, leading +, leading . in decimals, Infinity, NaN.

{name: "app"}              // INVALID: keys must be double-quoted
{"ports": [80, 443,]}      // INVALID: trailing comma
{/* comment */}             // INVALID: no comments
{"value": .5}              // INVALID: must be 0.5

JSON5 and JSONC

Because strict JSON is painful for config files, two extensions exist:

JSON5 — human-friendly superset: allows comments, trailing commas, unquoted keys, single quotes, hex, Infinity. Used by some build tools.

JSONC — JSON with Comments. Used by VS Code (settings.json), TypeScript (tsconfig.json). Allows // and /* */ comments plus trailing commas. Nothing else changes.

JSON Schema

JSON Schema validates the structure and content of JSON (and YAML) documents. Used by Kubernetes CRDs, Helm values validation, VS Code settings, GitHub Actions, and OpenAPI.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["name", "version"],
  "properties": {
    "name": {"type": "string", "pattern": "^[a-z][a-z0-9-]*$"},
    "version": {"type": "string", "pattern": "^\\d+\\.\\d+\\.\\d+$"},
    "replicas": {"type": "integer", "minimum": 1, "maximum": 100},
    "ports": {
      "type": "array",
      "items": {"type": "integer", "minimum": 1, "maximum": 65535},
      "uniqueItems": true
    }
  },
  "additionalProperties": false
}

JSON Pointer and JSON Patch

JSON Pointer (RFC 6901) references a specific value: /metadata/name, /spec/containers/0, /paths/~1api~1v1 (slash escaped as ~1).

JSON Patch (RFC 6902) describes mutations as operations:

[
  {"op": "replace", "path": "/spec/replicas", "value": 5},
  {"op": "add", "path": "/metadata/labels/version", "value": "v2"},
  {"op": "remove", "path": "/metadata/annotations/old-key"},
  {"op": "test", "path": "/metadata/name", "value": "my-app"}
]

Used by: Kubernetes strategic merge patches, kubectl patch, Kustomize overlays.

NDJSON / JSON Lines

Newline-delimited JSON — one JSON object per line. No wrapping array, no commas between records.

{"timestamp":"2024-01-15T10:00:00Z","level":"info","msg":"started"}
{"timestamp":"2024-01-15T10:00:01Z","level":"error","msg":"connection refused"}
{"timestamp":"2024-01-15T10:00:02Z","level":"info","msg":"retry succeeded"}

Used by: structured logging (Docker, Fluentd, CloudWatch), streaming APIs, Elasticsearch bulk API. jq processes NDJSON natively (one object per line). Convert to array with jq -s '.'.


TOML

Name origin: TOML is named after its creator Tom Preston-Werner, co-founder of GitHub. "Tom's Obvious Minimal Language" — the name is a statement of intent: no ambiguity, no surprises.

TOML (Tom's Obvious Minimal Language) is designed to be unambiguous. Used by Rust (Cargo.toml), Python (pyproject.toml), Hugo, and increasingly in modern tooling.

Basics

# This is a comment (TOML has comments!)

title = "My App"
version = "1.0.0"
debug = false
port = 8080

# Strings: double-quoted (escapes), single-quoted (literal), triple-quoted (multiline)
name = "basic string with \n escapes"
path = 'C:\no\escape\in\literal'
multi = """
  multiline basic
  string here"""

# Numbers — unambiguous, no octal/truthy traps
int_val = 42
hex = 0xDEADBEEF
octal = 0o755
underscored = 1_000_000      # visual separator

# Dates and times (first-class types!)
datetime = 2024-01-15T10:30:00Z
date = 2024-01-15

Tables (Sections)

[server]
host = "0.0.0.0"
port = 8080

[server.tls]                          # nested table via dotted key
cert = "/etc/ssl/cert.pem"

[database]
host = "db.example.com"
port = 5432

point = {x = 1, y = 2}               # inline table (single line only)

Arrays of Tables

# Each [[section]] creates a new element in an array
[[server]]
host = "web1.example.com"
port = 8080

[[server]]
host = "web2.example.com"
port = 8081
# Equivalent JSON: {"server": [{"host": "web1...", "port": 8080}, {"host": "web2...", "port": 8081}]}

When to Use TOML vs YAML

Use TOML for flat/shallow application config (pyproject.toml, Cargo.toml). Strong types, no truthy gotchas, hard to break. Gets verbose with deep nesting.

Use YAML for Kubernetes, CI/CD, Ansible — anything with deep nesting or multiple documents. Massive tooling ecosystem but easy to break.


INI

INI files are the simplest structured format. Still used by systemd, git config, pip, MySQL, PHP, and many legacy systems.

Format

; Comments use ; or #
[database]
host = localhost
port = 3306
name = myapp
debug = true          ; this is the STRING "true", not a boolean

Limitations: no nesting beyond one level, all values are strings, no standard for lists/arrays/escaping, duplicate section handling varies by parser.

systemd Unit File Syntax

systemd uses an INI-like format with extensions (repeated keys allowed, variable expansion):

[Unit]
Description=My Application Service
After=network.target postgresql.service

[Service]
Type=simple
User=appuser
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/python3 -m uvicorn app:main --host 0.0.0.0 --port 8000
Restart=on-failure
RestartSec=5
EnvironmentFile=/etc/myapp/config.env
MemoryMax=512M

[Install]
WantedBy=multi-user.target

Python configparser

import configparser
config = configparser.ConfigParser()
config.read('app.ini')

host = config['database']['host']              # 'localhost' (string)
port = config.getint('database', 'port')       # 3306 (typed accessor)
debug = config.getboolean('database', 'debug') # True (parses string)
# DEFAULT section values are inherited by all sections
# %(key)s interpolation supported: log = %(base)s/logs

Envfile (.env)

Envfiles store environment variables. Used by Docker, docker-compose, systemd, and application frameworks.

Format

# .env file — shell-compatible key=value pairs
APP_ENV=production
DATABASE_URL=postgres://user:pass@db:5432/myapp
SECRET_KEY=a1b2c3d4e5f6
PORT=8080

# Quotes recommended for values with spaces
APP_NAME="My Application"

# No spaces around = (some parsers break on spaces)
# Variable expansion (some parsers only): ${BASE_URL}/users

docker-compose env

services:
  app:
    env_file: [.env, .env.local]        # .env.local overrides .env
    environment:
      - APP_ENV=production              # inline, overrides env_file
      - DATABASE_URL                    # passes through from host shell
    ports:
      - "${APP_PORT:-8080}:8080"        # variable substitution with default

Precedence (highest wins): environment: > shell env > .env file > env_file: > Dockerfile ENV.

Shell Sourcing

# Source an env file in bash (preferred method)
set -a && source .env && set +a

# Caution: export $(grep -v '^#' .env | xargs) includes quotes in values
# See Footguns doc for details

Tools

jq — JSON Processor

jq is the standard tool for querying and transforming JSON from the command line.

# Field access and nesting
echo '{"name":"nginx","ver":"1.25"}' | jq '.name'           # "nginx"
kubectl get pod mypod -o json | jq '.spec.containers[0].image'

# Array iteration and filtering
echo '[1,2,3,4,5]' | jq '.[] | . * 2'                       # 2 4 6 8 10
kubectl get pods -o json | jq '.items[] | select(.status.phase == "Running") | .metadata.name'

# Map, reduce, construct new objects
echo '[{"n":"a","v":1},{"n":"b","v":2}]' | jq 'map(.v)'     # [1, 2]
echo '[1,2,3,4,5]' | jq 'reduce .[] as $x (0; . + $x)'     # 15
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'

# String interpolation and encoding
echo '{"host":"db","port":5432}' | jq '"jdbc:postgresql://\(.host):\(.port)/mydb"'
echo '{"user":"admin"}' | jq -r '.user | @base64'           # YWRtaW4=

# Slurp mode (read multiple inputs into array), raw output (-r), null safety
cat *.json | jq -s 'map(select(.status == "error"))'
echo '{"name":"nginx"}' | jq -r '.name'                     # nginx (no quotes)
echo '{}' | jq '.missing // "default"'                      # "default"

# Group and aggregate
echo '[{"t":"a","v":2},{"t":"b","v":1},{"t":"a","v":3}]' | \
  jq 'group_by(.t) | map({type: .[0].t, total: map(.v) | add})'

yq — YAML Processor (Mike Farah's Go Version)

yq is jq for YAML. The Go version by Mike Farah is the standard.

yq '.metadata.name' deployment.yaml                    # read a value
yq '.spec.template.spec.containers[0].image' deploy.yaml  # nested access
yq -i '.spec.replicas = 5' deployment.yaml             # set in-place
yq -i 'del(.metadata.annotations)' deployment.yaml     # delete a field
yq -o json deployment.yaml                             # YAML to JSON
yq -P input.json                                       # JSON to YAML
yq eval-all 'select(.kind == "Deployment")' multi.yaml # filter multi-doc
yq eval-all 'select(fileIndex == 0) * select(fileIndex == 1)' base.yaml override.yaml  # merge

yamllint

yamllint deployment.yaml
yamllint -d '{extends: relaxed, rules: {line-length: {max: 200}, truthy: disable}}' file.yaml
yamllint -d relaxed k8s/manifests/          # check entire directory

jsonlint / python -m json.tool

python3 -m json.tool config.json            # pretty-print (available everywhere)
jq '.' config.json                          # pretty-print with jq
jq -c '.' config.json                       # compact (remove whitespace)
python3 -c "import json; json.load(open('config.json'))"  # validate in scripts

envsubst

export APP_NAME=myapp APP_PORT=8080
envsubst < template.yaml > output.yaml                 # substitute all env vars
envsubst '$APP_NAME $APP_PORT' < template.yaml > out.yaml  # restrict to specific vars
cat template.yaml | envsubst | kubectl apply -f -       # pipe into kubectl

Format Comparison

YAML JSON TOML INI Envfile
Comments # No # ; # #
Nesting Unlimited Unlimited Tables 1 level None
Types Rich (gotchas) 6 strict Strong Strings Strings
Primary use Infra/k8s APIs/data App config System/legacy Environment

Wiki Navigation