Portal | Level: L2: Operations | Topics: Log Pipelines, Logging, Loki | Domain: Observability
Log Pipelines - Primer¶
Why This Matters¶
Logs are the exhaust of your infrastructure. Every process, service, and kernel event writes log data somewhere. The challenge is not generating logs — it is collecting them from hundreds of sources, parsing them into something queryable, routing them to the right destination, and doing it all without losing data or drowning your storage.
A log pipeline is the plumbing between "application wrote a line to a file" and "engineer runs a query in Grafana." If the pipeline is broken, you are blind. If the pipeline is slow, your incident response is slow. If the pipeline drops data, you cannot do forensics.
Analogy: A log pipeline is plumbing. Sources are faucets, buffers are water tanks, destinations are sinks. If the drain is slow (destination overloaded), water backs up into the tank (buffer fills). When the tank overflows, you either flood the house (block the app) or let water run onto the floor (drop logs). Good plumbing means right-sized pipes and overflow drains.
Log Pipeline Architecture¶
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Sources │────▶│ Collection │────▶│ Processing │────▶│ Destinations │
│ │ │ │ │ │ │ │
│ App logs │ │ Fluentbit │ │ Parse │ │ Elasticsearch│
│ System logs │ │ Fluentd │ │ Filter │ │ S3 / GCS │
│ Container │ │ Vector │ │ Enrich │ │ Loki │
│ stdout/err │ │ Filebeat │ │ Route │ │ Kafka │
│ Syslog │ │ Promtail │ │ Buffer │ │ Datadog │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
Key Concepts¶
| Concept | What It Means |
|---|---|
| Structured logging | JSON or key-value pairs vs free-text lines |
| Parsing | Extracting fields from unstructured text |
| Routing | Sending different logs to different destinations |
| Buffering | Holding logs in memory/disk when the destination is slow |
| Backpressure | What happens when the pipeline is full |
| At-least-once | Guarantee: logs may be duplicated but not lost |
| Exactly-once | Holy grail, rarely achievable in practice |
Structured vs Unstructured Logs¶
Unstructured (Bad for Pipelines)¶
Mar 15 14:23:01 web1 nginx: 192.168.1.1 - - [15/Mar/2024:14:23:01 +0000] "GET /api/users HTTP/1.1" 200 1234
You need a regex to extract the IP, status code, path. If the format changes, the regex breaks.
Structured (Good for Pipelines)¶
{
"timestamp": "2024-03-15T14:23:01Z",
"host": "web1",
"service": "nginx",
"client_ip": "192.168.1.1",
"method": "GET",
"path": "/api/users",
"status": 200,
"bytes": 1234
}
Fields are already extracted. No parsing needed. Every tool in the pipeline can work with it directly.
Make the Decision at the Source¶
Application code ──▶ Write JSON logs ──▶ Pipeline reads JSON ──▶ No parsing needed
Application code ──▶ Write free text ──▶ Pipeline parses regex ──▶ Fragile, slow
If you control the application, always emit structured logs. Parsing is for things you do not control (system logs, third-party apps).
Remember: "Structure at the source, parse at the edge." The cheapest place to add structure to logs is in the application code (JSON output). The most expensive place is in the pipeline (regex parsing). Every regex parser is a maintenance burden and a latency cost. Convince developers to emit JSON and you eliminate an entire category of pipeline problems.
Parsing Strategies¶
When you must parse, you have options:
Regex Parsing¶
# Nginx combined log format
^(?<client>[^ ]+) [^ ]+ (?<user>[^ ]+) \[(?<time>[^\]]+)\] "(?<method>\w+) (?<path>[^ ]+) [^"]+" (?<status>\d+) (?<bytes>\d+)
Pros: Flexible, handles any format. Cons: Slow, hard to maintain, breaks when format changes.
JSON Parsing¶
Pros: Fast, reliable. Cons: Only works if the source emits JSON.
Key-Value Parsing¶
# Input: user=alice action=login status=success duration=0.45s
# Output: {user: "alice", action: "login", status: "success", duration: "0.45s"}
Pros: Common in application logs, easy to parse. Cons: No standard escaping for values with spaces.
Delimiter / CSV Parsing¶
Grok Patterns (Logstash Heritage)¶
# Named patterns that compose
%{IP:client_ip} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:status}
Pros: Readable, reusable. Cons: Still regex under the hood, still slow.
The Big Three: Fluentbit, Fluentd, Vector¶
Fluentbit¶
- Written in C, tiny memory footprint (~2MB)
- Ideal for edge collection (run on every node)
- Config is INI-style (simple)
- Limited transformation capabilities
- Common in Kubernetes (DaemonSet)
# /etc/fluent-bit/fluent-bit.conf
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/app/*.log
Tag app.*
Parser json
Refresh_Interval 5
Mem_Buf_Limit 10MB
[FILTER]
Name modify
Match app.*
Add hostname ${HOSTNAME}
Add environment production
[OUTPUT]
Name forward
Match *
Host fluentd-aggregator
Port 24224
[OUTPUT]
Name es
Match app.*
Host elasticsearch
Port 9200
Index app-logs
Type _doc
Fluentd¶
- Written in Ruby + C
- Rich plugin ecosystem (800+ plugins)
- Good as an aggregator (central processing)
- Higher memory usage than Fluentbit
- Config is XML-like (more verbose)
# /etc/fluentd/fluent.conf
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<filter app.**>
@type parser
key_name log
<parse>
@type json
</parse>
</filter>
<filter app.**>
@type record_transformer
<record>
cluster production-east
</record>
</filter>
<match app.**>
@type elasticsearch
host elasticsearch.internal
port 9200
index_name app-logs
<buffer>
@type file
path /var/log/fluentd/buffer/es
flush_interval 5s
chunk_limit_size 8MB
total_limit_size 2GB
retry_max_interval 30s
overflow_action block
</buffer>
</match>
Vector¶
- Written in Rust, high performance
- Single binary, does collection + aggregation
- Config is TOML (or YAML)
- Strong typing and transforms (VRL language)
- Growing ecosystem, fewer plugins than Fluentd
# /etc/vector/vector.toml
[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
read_from = "beginning"
[transforms.parse_json]
type = "remap"
inputs = ["app_logs"]
source = '''
. = parse_json!(.message)
.timestamp = now()
.hostname = get_hostname!()
'''
[transforms.filter_errors]
type = "filter"
inputs = ["parse_json"]
condition = '.level == "error" || .level == "fatal"'
[sinks.elasticsearch]
type = "elasticsearch"
inputs = ["parse_json"]
endpoints = ["http://elasticsearch:9200"]
bulk.index = "app-logs-%Y-%m-%d"
[sinks.error_alerts]
type = "http"
inputs = ["filter_errors"]
uri = "https://alerts.example.com/webhook"
encoding.codec = "json"
Choosing Between Them¶
> **Who made it:** Fluentd was created by Sadayuki Furuhashi (known as "frsyuki") at Treasure Data in 2011. It became a CNCF Graduated project in 2019. Fluent Bit was created by Eduardo Silva at Treasure Data in 2015 as a lightweight C alternative for resource-constrained environments. Vector was created by Timber Technologies (now part of Datadog) in 2019, written in Rust for maximum throughput.
Need lightweight edge agent? → Fluentbit
Need rich plugin ecosystem? → Fluentd
Need high throughput + transforms? → Vector
Kubernetes DaemonSet? → Fluentbit (or Vector)
Central aggregation layer? → Fluentd or Vector
Already in the Fluentd ecosystem? → Fluentbit → Fluentd
Greenfield deployment? → Vector (modern, fast)
Buffering and Backpressure¶
The most critical part of a log pipeline is what happens when the destination is slow or down.
Normal Flow
Source ──▶ Buffer ──▶ Destination (fast)
Backpressure
Source ──▶ Buffer ──▶ Destination (slow/down)
↑
Buffer fills up
What happens?
Option 1: Block → Source slows down (safe, but app may stall)
Option 2: Drop → Oldest or newest logs discarded (data loss)
Option 3: Overflow → Write to disk when memory buffer is full
Buffer Configuration Pattern¶
Memory buffer (fast, limited):
- First tier, handles normal traffic
- Set a cap (e.g., 64MB)
File buffer (slower, larger):
- Overflow from memory buffer
- Survives process restarts
- Set a cap (e.g., 2GB)
When both are full:
- Block: stop accepting new logs (protect data, risk app stall)
- Drop: discard logs (protect app, lose data)
Routing and Tagging¶
Most pipelines use tags or labels to route logs to different destinations:
app.web.access ──▶ Elasticsearch (hot, 7 days)
app.web.error ──▶ Elasticsearch (hot, 30 days) + PagerDuty
infra.syslog ──▶ S3 (cold, 365 days)
security.auth ──▶ SIEM + S3 (compliance, 7 years)
debug.* ──▶ /dev/null (in production)
This is what makes pipelines powerful — different log types get different treatment. Debug logs go to cheap storage (or nowhere). Security logs go to the SIEM with long retention. Application errors go to the place your team actually searches.
Multiline Log Handling¶
Stack traces and multi-line log entries need special treatment:
# A Java stack trace is one logical entry split across many lines:
2024-03-15 14:23:01 ERROR NullPointerException
at com.example.Service.process(Service.java:42)
at com.example.Handler.handle(Handler.java:18)
at java.lang.Thread.run(Thread.java:829)
# Fluentbit multiline parser
[MULTILINE_PARSER]
name java_stack
type regex
flush_timeout 1000
rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "cont"
rule "cont" "/^\s+(at|Caused|\.{3})/" "cont"
[INPUT]
Name tail
Path /var/log/app/app.log
multiline.parser java_stack
Without multiline handling, each line of a stack trace becomes a separate log entry — useless for debugging.
Metrics to Monitor Your Pipeline¶
A log pipeline without monitoring is a log pipeline you will not know is broken:
- Input rate (events/sec)
- Output rate (events/sec)
- Buffer usage (bytes, % full)
- Retry count (destination failures)
- Drop count (lost events)
- Parse error count (bad format)
- Latency (time from input to destination)
When input rate > output rate persistently, your buffer is filling. Act before it overflows.
Gotcha: The most dangerous log pipeline failure is silent data loss. If your pipeline drops logs without alerting, you will not know until an incident investigation comes up empty. Always monitor the drop count metric and alert on it. A pipeline that blocks (slows the app) is safer than one that silently drops -- at least you notice the slowdown.
Wiki Navigation¶
Prerequisites¶
- Observability Deep Dive (Topic Pack, L2)
Related Content¶
- Runbook: Log Pipeline Backpressure / Logs Not Appearing (Runbook, L2) — Log Pipelines, Loki
- Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — Loki
- Interview: Loki Logs Disappeared (Scenario, L2) — Loki
- Lab: Loki No Logs (CLI) (Lab, L2) — Loki
- Linux Logging (Topic Pack, L1) — Logging
- Log Pipelines Flashcards (CLI) (flashcard_deck, L1) — Log Pipelines
- LogQL Drills (Drill, L2) — Loki
- Loki Flashcards (CLI) (flashcard_deck, L1) — Loki
- Observability Architecture (Reference, L2) — Loki
- Observability Deep Dive (Topic Pack, L2) — Loki
Pages that link here¶
- Anti-Primer: Log Pipelines
- Comparison: Logging Platforms
- Linux Logging
- Log Pipelines
- LogQL Drills
- Master Curriculum: 40 Weeks
- Observability Architecture
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- Runbook: Log Pipeline Backpressure / Logs Not Appearing
- Scenario: Logs Disappeared from Grafana Loki