awk — Footguns¶

Mistakes that produce wrong output, break portability, or waste hours on subtle data-processing bugs.

1. awk field numbering starts at 1 (not 0)¶

$0 is the entire line. $1 is the first field. If you are used to zero-indexed languages, you will be off by one on every field reference.

# You want the second column from a CSV
awk -F, '{ print $1 }' data.csv
# Prints the FIRST column, not the second

# $0 = entire line
# $1 = first field
# $2 = second field
awk -F, '{ print $2 }' data.csv  # correct

# NF is the number of fields, $NF is the LAST field
echo "a b c d" | awk '{ print $NF }'   # d
echo "a b c d" | awk '{ print $(NF-1) }'  # c

# NR is the record (line) number — also starts at 1
awk 'NR == 1' file.txt  # first line (not NR == 0)

2. awk floating point precision¶

awk uses double-precision floating point. For most log parsing and metrics, this is fine. But if you are summing financial data or comparing values that need exact decimal precision, you will get bitten.

# Precision surprise
echo "" | awk '{ printf "%.20f\n", 0.1 + 0.2 }'
# 0.30000000000000004441  (not 0.3)

# Accumulation error over many rows
seq 1 1000000 | awk '{ sum += 0.1 } END { printf "%.10f\n", sum }'
# 100000.0000013329  (not 100000.0)

# Comparison footgun
echo "0.1 0.2 0.3" | awk '{ if ($1 + $2 == $3) print "equal"; else print "not equal" }'
# not equal

# Fix for comparisons: use epsilon
echo "0.1 0.2 0.3" | awk '{
    diff = ($1 + $2) - $3
    if (diff < 0) diff = -diff
    if (diff < 0.0001) print "equal"; else print "not equal"
}'
# equal

# Fix for financial data: use integers (cents, not dollars)
awk '{ sum += int($1 * 100 + 0.5) } END { printf "$%.2f\n", sum/100 }' prices.txt

3. Not quoting awk programs¶

If your awk program contains $ characters, shell variables, or special characters, the shell expands them before awk sees them.

# BUG: shell expands $1 to the shell's first argument (probably empty)
awk "{ print $1 }" file.txt
# Prints the entire line ($0) because $1 expanded to nothing

# Fix: ALWAYS single-quote awk programs
awk '{ print $1 }' file.txt

# When you need shell variables inside awk, use -v
NAME="nginx"
awk -v name="$NAME" '$0 ~ name { print }' access.log

# DON'T embed shell variables with double quotes
awk "/$NAME/ { print }" access.log
# If NAME contains regex metacharacters (., *, [), this breaks
# If NAME contains /, the awk program is syntactically invalid

# Pass multiple variables
awk -v threshold="$THRESH" -v prefix="$PREFIX" \
    '$3 > threshold && $1 ~ prefix { print }' data.txt

4. awk RS (Record Separator) misunderstanding¶

RS controls what separates records (lines). The default is newline. Changing it changes what $0, NR, and the whole program see. Getting this wrong silently processes your data as one giant record.

# Trying to parse paragraph-separated records
# Data:
# Name: Alice
# Role: Admin
#
# Name: Bob
# Role: User

# BUG: RS="" (empty string) means "paragraph mode" — but it also strips
# leading/trailing blank lines and collapses multiple blank lines
awk 'BEGIN { RS="" } { print NR, $0 }' data.txt
# This actually works for simple cases, but:

# BUG: setting RS to a multi-character string in POSIX awk
awk 'BEGIN { RS="---" } { print NR }' data.txt
# POSIX: only first character of RS is used ("d" in this case!)
# gawk: uses the full string (GNU extension)

# Fix: use gawk explicitly if you need multi-char RS
gawk 'BEGIN { RS="---" } { print NR, $0 }' data.txt

# Fix: for POSIX portability, preprocess to a single-char separator
sed 's/---/\x00/g' data.txt | awk 'BEGIN { RS="\0" } { print NR }'

5. Modifying a file you are reading from¶

Reading and writing the same file in a pipeline truncates it before the read completes. The file ends up empty.

# DESTROYS the file:
awk '{ print $1 }' data.txt > data.txt
# data.txt is now empty — the shell truncated it before awk started reading

# Also UNSAFE:
cat data.txt | awk '{ print $1 }' > data.txt
# cat doesn't read the whole file first.

# Fix: use a temp file
awk '{ print $1 }' data.txt > data.tmp && mv data.tmp data.txt

# Fix: use sponge from moreutils (absorbs all input before writing)
awk '{ print $1 }' data.txt | sponge data.txt

Rule: Never redirect output to the same file you are reading in a single pipeline. Use temp files or sponge.

6. BSD vs GNU awk differences¶

The differences go beyond what you might expect. Many scripts break when moving between Linux and macOS.

# GNU awk (gawk) vs BSD awk vs mawk — different defaults
# gawk supports \t in printf format, BSD awk might not
echo "a" | awk '{ printf "%s\t%s\n", $1, "b" }'
# GNU: a    b
# Some old BSD awk: a\tb  (literal backslash-t)

# GNU awk length(array) — not POSIX
awk 'BEGIN { a[1]=1; a[2]=2; print length(a) }'
# gawk: 2
# mawk: error or wrong result

# Fix: check what you're running
awk --version 2>/dev/null || awk -V 2>/dev/null

# Fix: stick to POSIX features in portable scripts
# - No gawk-specific features (BEGINFILE, @include, etc.)
# - No length(array) — count manually with a for loop
# - No multi-char RS (only first character is used in POSIX)

7. Forgetting that awk splits on whitespace by default¶

When processing files with tabs or mixed whitespace, awk's default field splitting might not do what you expect.

# Input has multiple spaces between fields:
echo "name    Alice    admin" | awk '{ print $2 }'
# Output: Alice  (correct — awk collapses whitespace)

# But if you set FS to a single space, behavior changes:
echo "name    Alice    admin" | awk -F' ' '{ print $2 }'
# Output: (empty!) — each space is now a separate delimiter, creating empty fields

# Fix: use the default (no -F) for whitespace-delimited data
# Use -F'\t' for true tab-separated data
# Use -F'[ \t]+' if you need explicit whitespace collapsing with a custom separator

8. awk uninitialized variables are zero/empty (not an error)¶

awk never complains about uninitialized variables. They default to 0 (numeric) or "" (string). This means typos in variable names produce silent wrong results.

# Typo in variable name — no error, just wrong results
awk '{ total += $1 } END { print totla }' numbers.txt
# Prints empty string — "totla" is a different (uninitialized) variable

# Checking for key existence in arrays — nonexistent key returns ""
awk '{ if (seen[$1] == "") print "new:", $1; seen[$1]++ }' data.txt
# BUG: after the if-check, seen[$1] NOW EXISTS (with value "")

# Fix: use "in" operator to check array membership
awk '{ if (!($1 in seen)) print "new:", $1; seen[$1]++ }' data.txt