Quiz: Log Pipelines¶

4 questions

L0 (1 questions)¶

1. What is the difference between structured and unstructured logging, and why does it matter for log pipelines?

Show answer

Unstructured logs are free-text lines (e.g., Nginx default format) that require regex parsing to extract fields. Structured logs use JSON or key-value pairs where fields are already labeled. Structured logs are far easier to process, route, and query in a pipeline because no fragile parsing step is needed.

L1 (1 questions)¶

1. What are the four stages of a log pipeline and what tool handles each?

Show answer

1. Collection — agents on each node collect logs (Fluentbit, Promtail, Filebeat).
2. Processing — parse, filter, enrich, and transform (Fluentd, Vector, Logstash).
3. Buffering — hold logs when the destination is slow or down (memory/disk buffers in the pipeline tool).
4. Delivery — send to destinations (Elasticsearch, Loki, S3, Datadog). Many tools combine multiple stages.

L2 (1 questions)¶

1. Your log pipeline is dropping logs during traffic spikes. What is backpressure and how do you configure the pipeline to handle it?

Show answer

Backpressure occurs when the pipeline produces data faster than the destination can consume it. Solutions: (1) Add disk-based buffering so logs are not lost when memory fills. (2) Increase destination write capacity. (3) Configure retry with exponential backoff. (4) Set up a dead-letter queue for undeliverable logs. (5) In Fluentbit/Vector, tune buffer_size and flush intervals. The goal is at-least-once delivery — accept potential duplicates over data loss.

L3 (1 questions)¶

1. You are designing a log pipeline for a 500-node Kubernetes cluster producing 10TB/day of logs. What architecture do you use?

Show answer

Two-tier architecture: (1) Lightweight agents per node (Fluentbit as DaemonSet) for collection and basic filtering — drop debug logs, add metadata. (2) Aggregation layer (Fluentd or Vector as a separate Deployment) for heavy parsing, enrichment, routing. Route: hot logs (errors, security) to Elasticsearch/Loki for real-time query, cold logs to S3 with lifecycle policies. Use Kafka as a buffer between tiers for durability. Set resource limits on DaemonSet to prevent log agents from starving application pods.