Quiz: Linux Performance Tuning¶

7 questions

L0 (2 questions)¶

1. What is the USE method for performance analysis?

Show answer

For every resource (CPU, memory, disk, network), check three things: Utilization (how busy is it?), Saturation (is work queuing?), and Errors (are there error events?). High utilization is not automatically a problem; saturation always is. Tools: mpstat/iostat for utilization, vmstat run queue for saturation, dmesg/ethtool for errors.

2. What do the three load average numbers mean and when is load 'too high'?

Show answer

Load averages (from uptime or /proc/loadavg) show the average number of processes in runnable or uninterruptible sleep state over 1, 5, and 15 minutes. On a 4-core system, load 4.0 means fully utilized. Load > number of cores means processes are queuing. Compare 1-min vs 15-min: 1-min much higher = recent spike; 15-min higher = recovering. High load with low CPU = I/O wait (check with vmstat or iostat). Include uninterruptible I/O in the count — disk-bound processes inflate load.

L1 (2 questions)¶

1. What is the difference between 'free' and 'available' memory in 'free -h' output, and why does low 'free' not mean a problem?

Show answer

Free is truly unused memory. Available is free plus reclaimable page cache. Linux aggressively uses spare RAM for page cache (buffering disk reads), which is automatically reclaimed when applications need it. So 512MB free but 11GB available means the system has 11GB of usable memory. Low 'free' with high 'available' is normal and healthy.

2. How do you use dmesg to diagnose kernel-level issues?

Show answer

dmesg --human (timestamps) or dmesg -T (readable dates). Key patterns: OOM killer messages (Out of memory: Kill process), disk I/O errors (I/O error, dev sdX), ECC memory errors (EDAC), NIC link state changes (link up/down), filesystem errors (EXT4-fs error). Use dmesg -l err,warn for errors/warnings only. After a crash: dmesg -r shows raw priority levels. The kernel ring buffer is finite — old messages get overwritten.

L2 (2 questions)¶

1. How do you read iostat output to determine if a disk is the bottleneck?

Show answer

Key columns: %util > 70% on spinning disk means saturated (on NVMe, %util is misleading due to parallel queues). 'await' is average I/O latency in ms — this is what applications feel. High await with low %util suggests the disk is slow per-operation. High 'r/s' or 'w/s' with high await means IOPS-limited. Check 'rrqm/s' and 'wrqm/s' for request merging efficiency.

2. What are cgroups v2 and how do you set resource limits with them?

Show answer

cgroups v2 is the unified hierarchy for resource control (CPU, memory, I/O, PIDs). Mounted at /sys/fs/cgroup. Set limits: echo 500000 100000 > /sys/fs/cgroup/mygroup/cpu.max (50% of one CPU). echo 1G > /sys/fs/cgroup/mygroup/memory.max. Key controllers: cpu (bandwidth), memory (usage + swap), io (blkio throttle), pids (fork bomb prevention). Systemd uses cgroups v2 natively: CPUQuota=50%, MemoryMax=1G in unit files. Check usage: cat /sys/fs/cgroup/mygroup/memory.current.

L3 (1 questions)¶

1. Describe the correct methodology for sysctl tuning instead of copying values from blog posts.

Show answer

1. Baseline — measure current performance with a realistic workload.
2. Identify bottleneck — USE method across CPU, memory, disk, network.
3. Research the knob — read kernel documentation, not Medium posts.
4. Change one thing — single variable at a time.
5. Measure again — same workload, same measurement method.
6. Persist or revert — if improvement confirmed, add to /etc/sysctl.d/. Never cargo-cult values without measuring before and after.