Skip to content

awk — Trivia & Interesting Facts

Surprising, historical, and little-known facts about awk.


awk is named after its three creators at Bell Labs

AWK was created in 1977 by Alfred Aho, Peter Weinberger, and Brian Kernighan at Bell Labs. Each contributed a letter to the name. Aho was a formal languages expert (also created egrep), Weinberger worked on databases, and Kernighan co-authored "The C Programming Language." The trio designed awk as a data-driven programming language for text processing.


awk is a complete programming language, not just a text tool

awk has variables, arrays (associative), functions, loops, conditionals, printf formatting, regular expressions, and mathematical operations. It can implement sorting algorithms, process CSV files, and perform statistical analysis. Some developers have written HTTP servers, compilers, and even games entirely in awk. Despite this power, most people use awk only for awk '{print $2}'.


Brian Kernighan wrote "The AWK Programming Language" — twice

Kernighan co-authored the original awk book in 1988. In 2024 — 36 years later — he published a second edition, updated for modern usage patterns including Unicode support, CSV handling, and the one-true-awk implementation that Kernighan himself maintains. It is one of the longest gaps between editions of a programming book.


gawk (GNU awk) adds networking, time functions, and two-way pipes

GNU awk extends the language with features the original authors never imagined: TCP/IP networking (/inet/tcp/0/host/port), time functions (mktime, systime, strftime), sorted array traversal (PROCINFO["sorted_in"]), and two-way pipes for interactive subprocesses. gawk is the default awk on most Linux distributions and is actively maintained.


awk's field separator can be a regular expression

Most people know that awk -F',' splits on commas, but the field separator can be any regular expression: awk -F'[;,\t]' splits on semicolons, commas, or tabs simultaneously. The built-in variable FS (field separator) and RS (record separator) can be changed mid-program, allowing awk to process files with mixed delimiters in a single pass.


The awk one-liner for deduplication is an elegant hack

awk '!seen[$0]++' removes duplicate lines while preserving order — something sort -u cannot do (it requires sorted input). The trick: seen[$0] starts at 0 (falsy), so !seen[$0] is true on first occurrence. The ++ increments it to 1, making subsequent occurrences falsy. The entire program is an implicit print condition. This one-liner appears in thousands of shell scripts worldwide.


mawk is often faster than gawk — and it matters at scale

Michael Brennan's mawk (Mike's AWK) is a lightweight implementation optimized for speed. In benchmarks processing multi-gigabyte log files, mawk can be 2-10x faster than gawk because it uses a bytecode interpreter rather than gawk's tree-walking interpreter. Debian and Ubuntu default to mawk, while Red Hat defaults to gawk. The performance difference is negligible for small files but significant at log-processing scale.