Skip to content

jq — JSON Processing Primer

Why This Matters

Every API, every kubectl call, every cloud CLI returns JSON. jq is the standard tool for slicing, filtering, and transforming JSON on the command line. Without jq, you end up writing fragile grep/awk pipelines against structured data. With jq, you parse JSON properly in one pass.

Name origin: jq is modeled after sed and awk but for JSON data. The name follows the Unix tradition of short, lowercase tool names. Written by Stephen Dolan, released in 2012. The official tagline is "jq is like sed for JSON data."

Fun fact: jq has its own Turing-complete functional language with recursion, conditionals, string interpolation, and user-defined functions. Most people only use 5% of its capabilities — .field, select(), map(), and @csv cover 90% of DevOps use cases.

Core Concepts

Identity and Basic Filters

# Pretty-print JSON
echo '{"name":"web","replicas":3}' | jq .

# Extract a field
echo '{"name":"web","replicas":3}' | jq '.name'
# "web"

# Nested access
echo '{"spec":{"replicas":3}}' | jq '.spec.replicas'
# 3

Array Access

# First element
echo '[1,2,3]' | jq '.[0]'

# Last element
echo '[1,2,3]' | jq '.[-1]'

# Slice
echo '[1,2,3,4,5]' | jq '.[2:4]'

# Iterate all elements
echo '[1,2,3]' | jq '.[]'

Filtering with select

Remember: The jq essentials mnemonic: "DSM" — Dot (.field for access), Select (select() for filtering), Map (map() for transformation). These three operations cover 90% of DevOps jq usage. Master them before learning anything else.

select() keeps elements matching a condition:

# Pods not in Running state
kubectl get pods -o json | jq '.items[] | select(.status.phase != "Running") | .metadata.name'

# Events in the last hour
cat events.json | jq '.items[] | select(.lastTimestamp > "2024-01-01T12:00:00Z")'

# Nodes with high memory pressure
kubectl get nodes -o json | jq '.items[] | select(.status.conditions[] | select(.type=="MemoryPressure" and .status=="True")) | .metadata.name'

Transforming with map

map() applies a filter to every array element:

# Extract all names
echo '[{"name":"a"},{"name":"b"}]' | jq 'map(.name)'
# ["a","b"]

# Add a field to every object
echo '[{"host":"web1"},{"host":"web2"}]' | jq 'map(. + {"env":"prod"})'

Constructing Objects

Build new JSON structures:

# Pick specific fields
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'

# Array of constructed objects
kubectl get nodes -o json | jq '[.items[] | {name: .metadata.name, cpu: .status.capacity.cpu}]'

Reduce

Aggregate values across elements:

# Sum all replica counts
echo '[{"r":3},{"r":5},{"r":2}]' | jq 'reduce .[] as $item (0; . + $item.r)'
# 10

# Collect unique namespaces
kubectl get pods -A -o json | jq '[.items[].metadata.namespace] | unique'

Slurp Mode (-s)

Combine multiple JSON objects into a single array:

# Merge multiple JSON lines into an array
cat lines.jsonl | jq -s '.'

# Count lines
cat lines.jsonl | jq -s 'length'

# Sort by a field
cat lines.jsonl | jq -s 'sort_by(.timestamp)'

Raw Output (-r)

Gotcha: Forgetting -r when piping jq output to other commands is the #1 jq mistake. Without -r, strings include literal quotes: "web-pod-abc" instead of web-pod-abc. Those quotes break xargs, for loops, and any tool expecting clean input. Always use -r when jq output feeds into other commands.

Strip quotes from string output — essential for scripting:

# Without -r: "web-pod-abc"
# With -r: web-pod-abc
kubectl get pod web -o json | jq -r '.metadata.name'

# Use in a loop
for ns in $(kubectl get ns -o json | jq -r '.items[].metadata.name'); do
    echo "Namespace: $ns"
done

Piping with curl and kubectl

# GitHub API: list repo names
curl -s https://api.github.com/users/octocat/repos | jq -r '.[].name'

# AWS: list running instance IDs
aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.State.Name=="running") | .InstanceId'

String Interpolation

# Build a formatted string
echo '{"host":"db1","port":5432}' | jq -r '"\(.host):\(.port)"'
# db1:5432

# CSV output
kubectl get pods -o json | jq -r '.items[] | [.metadata.name, .status.phase] | @csv'

Conditionals and Alternatives

# if-then-else
echo '{"status":"failed"}' | jq 'if .status == "failed" then "ALERT" else "OK" end'

# Alternative operator (default value)
echo '{}' | jq '.missing // "default"'

Built-in Functions

Key functions: length, keys, values, unique, sort_by(.field), group_by(.field), to_entries/from_entries, flatten, type.

yq — YAML Equivalent

yq applies jq-like syntax to YAML (and can convert between formats):

# Read a YAML field
yq '.spec.replicas' deployment.yaml

# Convert YAML to JSON
yq -o json deployment.yaml

# Edit YAML in place
yq -i '.spec.replicas = 5' deployment.yaml

Debugging jq Expressions

  • Build incrementally: start with . and add one filter at a time
  • Use type to check what you are operating on
  • Use debug to print intermediate values: jq '.[] | debug | .name'
  • Use --arg to pass shell variables: jq --arg h "$HOSTNAME" '.host = $h'

One-liner: kubectl get pods -o json | jq -r '.items[] | select(.status.phase != "Running") | "\(.metadata.namespace)/\(.metadata.name) \(.status.phase)"' — the single most useful Kubernetes + jq command. Shows all non-Running pods with namespace, name, and status in one line.

Under the hood: jq compiles your filter expression into a bytecode VM that processes the JSON stream in a single pass. For large files, this is dramatically faster than loading into Python and iterating. jq can process multi-GB JSON files that would crash a Python script due to memory exhaustion, because it streams rather than loading the entire document.


Wiki Navigation