YAML, JSON & Config Formats - Footguns¶
Mistakes that break deployments, corrupt configs, or waste hours of debugging. Each one has bitten real teams in production.
1. The Norway Problem (YAML NO = false)¶
After parsing: [GB, US, false]. Any unquoted value matching YAML 1.1's boolean set (yes/no/on/off/y/n, case-insensitive) gets coerced. Real-world victims: GitHub Actions on: trigger (boolean true), Helm values with yes/no toggles, Ansible variables.
Fix: Always quote: "NO". Or use !!str NO. Or use a YAML 1.2 parser.
2. Tabs in YAML Indentation¶
YAML forbids tabs for indentation. A single tab produces a parse error with an unhelpful message. Common source: pasting from wikis, Slack, or editors set to use tabs.
Fix: Configure your editor for spaces in YAML. Add indent_style = space to .editorconfig. Detect: grep -P '\t' file.yaml.
3. Trailing Whitespace Breaking YAML¶
A line that looks empty but has trailing spaces becomes content in multiline strings. Your "blank line" is actually " \n" instead of "\n".
Fix: Strip trailing whitespace on save. In CI: grep -rn ' $' *.yaml.
4. YAML 1.1 Octal Notation (0777 = 511)¶
In YAML 1.1, 0777 is octal = integer 511. Your Helm chart sets permissions to 0644, the Go template receives 420. Your chmod call does something entirely different.
Fix: Always quote permission values: "0777". Or use YAML 1.2 syntax: 0o777.
5. Unquoted Strings That Look Like Numbers or Booleans¶
version: 1.20 # float 1.2 (trailing zero dropped!)
zipcode: 01234 # YAML 1.1: octal 668
country: NO # boolean false
empty: null # null, not string "null"
tilde: ~ # null, not string "~"
time: 1:30 # YAML 1.1: sexagesimal = integer 90
Fix: Always quote version numbers, zip codes, country codes, and anything that could be a number or boolean. Use yamllint's truthy rule.
6. JSON Without Trailing Newline¶
Many tools generate JSON without a trailing newline. Git warns, diff tools complain, and cat a.json b.json produces invalid JSON (two objects glued together).
Fix: Add insert_final_newline = true for *.json in .editorconfig. Or: jq '.' file.json > tmp && mv tmp file.json.
7. .env File Quote Handling¶
Different .env parsers handle quotes differently:
| Parser | APP_NAME value | SECRET value |
|---|---|---|
| docker-compose | My App |
s3cr3t |
bash source |
My App |
s3cr3t |
| Node.js dotenv | My App |
s3cr3t |
| Python python-dotenv | My App |
s3cr3t |
| Some CI systems | "My App" (quotes included!) |
's3cr3t' (quotes included!) |
export $(cat .env \| xargs) |
"My App" (quotes included!) |
's3cr3t' (quotes included!) |
# The xargs approach includes quotes as part of the value
export $(grep -v '^#' .env | xargs)
echo $APP_NAME # "My App" — the double quotes are IN the value
# Safer approach
set -a && source .env && set +a
echo $APP_NAME # My App — correct
Fix: Know which parser you are using. Test with values that contain quotes and spaces. Prefer source over xargs for shell usage. For CI, check the docs for your specific platform.
8. Multiline String Trailing Newlines¶
The four YAML multiline variants have different trailing newline behavior, and getting it wrong breaks scripts and configs:
# | (literal) — includes one trailing newline
with_newline: |
hello
# value: "hello\n"
# |- (literal strip) — no trailing newline
no_newline: |-
hello
# value: "hello"
# |+ (literal keep) — preserves ALL trailing newlines
keep_newlines: |+
hello
# value: "hello\n\n" (blank line preserved)
The trap: You write a shell script in a ConfigMap using | and it works. You switch to |- to "clean it up" and now the script fails because the last command has no newline terminator. Or you use |+ and your config file has extra blank lines that a strict parser rejects.
# This will fail because the script has no trailing newline
data:
init.sh: |-
#!/bin/bash
echo "starting"
exec /app/server
# Some shells need the trailing newline after the last command
# This works
data:
init.sh: |
#!/bin/bash
echo "starting"
exec /app/server
Fix: Use | (literal block) for scripts. Use |- for values where trailing newlines cause problems (URLs, connection strings). Never use |+ unless you explicitly need to preserve trailing blank lines.
9. YAML Anchor Aliased Values Are Shared References¶
In some YAML implementations, aliases share the same reference as the anchor. Modifying the aliased value modifies the original.
defaults: &defaults
timeout: 30
retries: 3
service_a:
<<: *defaults
timeout: 60 # this is fine — explicit override
service_b:
<<: *defaults # gets timeout: 30, retries: 3
In most parsers this works correctly because the merge creates a copy. But if you use a library that preserves references (like some Python ruamel.yaml round-trip modes), mutating service_b['retries'] in code also mutates defaults['retries'].
import yaml
data = yaml.safe_load(open('config.yaml'))
# safe_load resolves anchors into separate objects — safe
from ruamel.yaml import YAML
ry = YAML()
data = ry.load(open('config.yaml'))
# ruamel may preserve alias relationships for round-trip fidelity
# data['service_b']['retries'] = 5 might also change defaults
Fix: If you manipulate YAML programmatically after loading, verify that your parser resolves anchors into independent copies. Test by modifying an aliased value and checking the original.
10. YAML Merge Key Override Order¶
When using << with multiple merge sources, the precedence is non-obvious and inconsistent across implementations:
high: &high
timeout: 60
retries: 5
low: &low
timeout: 10
retries: 1
service:
<<: [*high, *low] # which timeout wins?
retries: 3 # explicit key always wins
Per the YAML merge key spec, the first mapping in the list takes precedence for duplicate keys. So timeout is 60 (from *high). But explicit keys (retries: 3) always override merged values.
The problem: Not all parsers implement this correctly. Some reverse the order. Some don't support merge key lists at all. The merge key (<<) is not part of the YAML 1.2 core spec — it is a type-specific extension from YAML 1.1.
Fix: Avoid complex merge chains. If you must use merges, test with your specific parser. Prefer explicit repetition over clever merge hierarchies — the next person debugging at 3 AM will thank you.
11. INI Duplicate Sections¶
INI has no formal specification, so parsers handle duplicate sections differently:
[database]
host = primary.db.internal
port = 5432
# 200 lines later...
[database]
host = replica.db.internal
port = 5433
| Parser | Behavior |
|---|---|
| Python configparser | Raises DuplicateSectionError (default) or silently merges (last wins) |
| PHP parse_ini_file | Last section wins (silently) |
| systemd | Both sections merged (last value wins per key) |
| Git config | Both sections preserved (can have multiple values per key) |
| Some Java parsers | First section wins (silently drops second) |
Fix: Lint your INI files for duplicate section headers. Use a format with a real spec (TOML, YAML) if you need guaranteed behavior.
12. TOML Array of Tables Syntax Confusion¶
TOML uses [[double brackets]] for arrays of tables, but the interaction with regular [single brackets] is easy to get wrong:
# This creates an array with one server element, each having a nested mapping
[[server]]
host = "web1"
[server.logging] # ERROR: 'server' is an array, not a table
level = "info"
# Correct: use the double-bracket prefix
[[server]]
host = "web1"
[server.logging] # ERROR: still wrong
# Actually correct:
[[server]]
host = "web1"
log_level = "info" # flat keys within the array element
# For nested tables inside array elements:
[[server]]
host = "web1"
[[server]]
host = "web2"
# To add sub-tables to the LAST array element:
[[server]]
host = "web3"
[server.tls] # ERROR in strict parsers
cert = "/path/cert.pem"
The correct pattern for nested config in array elements is to use inline tables or restructure:
[[server]]
host = "web1"
tls = {cert = "/path/cert.pem", key = "/path/key.pem"}
[[server]]
host = "web2"
tls = {cert = "/other/cert.pem", key = "/other/key.pem"}
Fix: Keep TOML array-of-tables flat. Put complex nested config in separate files. Use a TOML linter (taplo lint) to catch structural errors before they hit production.
13. jq Null Propagation¶
jq propagates null silently through expressions, hiding missing data instead of erroring:
# Missing field returns null, no error
echo '{"name":"app"}' | jq '.version'
# null
# Null propagates through operations
echo '{"name":"app"}' | jq '.version | length'
# null
# Null in arithmetic silently produces null
echo '{"a": 1}' | jq '.b + 1'
# null (not an error!)
# Null in string interpolation
echo '{}' | jq '"Version: \(.version)"'
# "Version: null"
# Array access on null
echo '{}' | jq '.items[0].name'
# null
# This means your scripts silently produce garbage when data is missing
IMAGE=$(kubectl get pod mypod -o json | jq -r '.spec.containers[0].image')
# If the pod doesn't exist, IMAGE is "null" (the string, not empty)
# Your subsequent docker pull null will fail in confusing ways
Fix: Use the // (alternative) operator for defaults, and ? or try-catch for optional access:
# Default value
echo '{}' | jq '.version // "unknown"'
# "unknown"
# Error on null (will exit non-zero)
echo '{}' | jq '.version // error("version field is missing")'
# Check for null explicitly in scripts
IMAGE=$(kubectl get pod mypod -o json | jq -r '.spec.containers[0].image // empty')
if [ -z "$IMAGE" ]; then
echo "ERROR: could not get image" >&2
exit 1
fi
# Use -e flag (exit status reflects truthiness of output)
if echo '{}' | jq -e '.version' > /dev/null 2>&1; then
echo "version exists"
else
echo "version missing"
fi