Portal | Level: L0: Entry | Topics: Linux Fundamentals, Bash / Shell Scripting | Domain: Linux
Linux Operations Drills¶
Remember: The five essential Linux diagnostic commands:
top(CPU/memory live),df -h(disk space),free -h(memory),ss -tlnp(listening ports),journalctl -p err(recent errors). Mnemonic: "TDFJ-S" — Top, Disk-free, Free, Journalctl, Socket-stats. These five commands answer "what is wrong" in 90% of Linux incidents.
Drill 1: Find Top CPU Consumers¶
Difficulty: Easy
Q: Find the top 5 processes consuming the most CPU.
Answer
Drill 2: Investigate Disk Space¶
Difficulty: Easy
Q: A server is reporting disk full. Find which directories are consuming the most space under /var.
Answer
# Quick overview
df -h
# Find biggest directories under /var
du -sh /var/* | sort -rh | head -10
# Go deeper into the largest directory
du -sh /var/log/* | sort -rh | head -10
# Find files larger than 100MB
find /var -type f -size +100M -exec ls -lh {} \;
# Check for deleted files still held open
lsof +L1 | grep deleted
# These consume space but don't appear in du
Drill 3: Systemd Service Management¶
Difficulty: Easy
Q: A service keeps crashing. Check its status, read its logs, and configure it to restart automatically.
Answer
# Check status
systemctl status myapp
# Read recent logs
journalctl -u myapp --since "30 min ago"
journalctl -u myapp -f # Follow live
# Check for restart settings in the unit file
systemctl cat myapp
# If no auto-restart, create an override:
systemctl edit myapp
# Add:
# [Service]
# Restart=on-failure
# RestartSec=5
systemctl daemon-reload
systemctl restart myapp
Drill 4: Memory Pressure Investigation¶
Difficulty: Medium
Q: Applications are being OOM-killed. Investigate memory usage and identify the culprit.
Answer
# Check for OOM kills
dmesg | grep -i "out of memory"
journalctl -k | grep -i "oom"
# Current memory overview
free -h
# Per-process memory usage (sorted)
ps aux --sort=-%mem | head -10
# Detailed memory breakdown
cat /proc/meminfo | head -20
# Check swap usage
swapon --show
vmstat 1 5 # Watch si/so columns for swap activity
# Find specific process memory
pmap -x $(pgrep java) | tail -5
Drill 5: Network Connectivity Debugging¶
Difficulty: Medium
Q: A server can't reach an external API on port 443. Systematically debug the issue.
Answer
# 1. DNS resolution
dig api.example.com +short
# Or: nslookup api.example.com
# 2. Can we reach the IP?
ping -c 3 $(dig +short api.example.com | head -1)
# 3. Can we reach the port?
nc -zv api.example.com 443 -w 5
# Or: curl -v --connect-timeout 5 https://api.example.com
# 4. Check local firewall
iptables -L -n | grep 443
# Or: nft list ruleset | grep 443
# 5. Check routing
ip route get $(dig +short api.example.com | head -1)
traceroute api.example.com
# 6. Check if a proxy is needed
env | grep -i proxy
Drill 6: File Permissions Troubleshooting¶
Difficulty: Medium
Q: A web server returns 403 Forbidden. The config looks correct. What do you check?
Answer
# 1. Check file permissions
ls -la /var/www/html/index.html
# Web server user (www-data/nginx) needs read access
# 2. Check directory permissions (traverse requires +x)
namei -l /var/www/html/index.html
# Every directory in the path needs execute (x) permission
# 3. Check file ownership
stat /var/www/html/index.html
# 4. Fix permissions
chmod 644 /var/www/html/index.html # File: rw-r--r--
chmod 755 /var/www/html/ # Dir: rwxr-xr-x
# 5. Check SELinux (if enabled)
getenforce
ls -Z /var/www/html/index.html
restorecon -Rv /var/www/html/
# 6. Check AppArmor (if enabled)
aa-status
Drill 7: Cron Job Debugging¶
Difficulty: Medium
Q: A cron job isn't running. How do you debug it?
Answer
# 1. Is cron running?
systemctl status cron # or crond
# 2. Check the crontab
crontab -l # Current user
crontab -l -u www-data # Specific user
# 3. Check cron logs
grep CRON /var/log/syslog | tail -20
# Or: journalctl -u cron --since "1 hour ago"
# 4. Common issues:
# - Wrong PATH (cron has minimal PATH)
# Fix: add PATH=/usr/local/bin:/usr/bin:/bin at top of crontab
# - Missing executable permission
# - Script uses relative paths (cron runs from $HOME)
# - Environment variables not set
# 5. Test the command manually:
env -i PATH=/usr/local/bin:/usr/bin:/bin HOME=/root /opt/backup.sh
# env -i simulates cron's minimal environment
Drill 8: SSH Troubleshooting¶
Difficulty: Medium
Q: You can't SSH into a server. Walk through the debug process.
Answer
# 1. Verbose SSH connection
ssh -vvv user@host
# 2. Check from the server side (if you have console/BMC access)
systemctl status sshd
ss -tlnp | grep 22 # Is sshd listening?
journalctl -u sshd --since "5 min ago"
# 3. Common causes:
# - Firewall blocking port 22
iptables -L -n | grep 22
# - Wrong permissions on ~/.ssh
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chmod 600 ~/.ssh/id_ed25519
# - sshd_config denying user
grep -E "AllowUsers|DenyUsers|AllowGroups" /etc/ssh/sshd_config
# - Host key changed (MITM warning)
ssh-keygen -R hostname # Remove old key
Drill 9: Log Analysis with Command Line¶
Difficulty: Hard
Q: Find the top 10 IPs making requests to nginx, show requests per second for the last hour, and find all 5xx errors.
Answer
# Top 10 IPs
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Requests per second (last hour)
awk -v start="$(date -d '1 hour ago' '+%d/%b/%Y:%H:%M')" \
'$4 >= "["start' /var/log/nginx/access.log | \
awk '{print $4}' | cut -d: -f1-3 | uniq -c | sort -rn | head -10
# All 5xx errors
awk '$9 ~ /^5/' /var/log/nginx/access.log | tail -20
# 5xx count by status code
awk '$9 ~ /^5/ {count[$9]++} END {for (s in count) print s, count[s]}' \
/var/log/nginx/access.log
# 5xx errors by URL
awk '$9 ~ /^5/ {count[$7]++} END {for (u in count) print count[u], u}' \
/var/log/nginx/access.log | sort -rn | head -10
Drill 10: Performance Triage (USE Method)¶
Difficulty: Hard
Q: A server is "slow." Run through a systematic performance investigation.
Answer
# 1. Load average (CPU saturation)
uptime
# load > num_cpus = saturation
# 2. CPU
mpstat -P ALL 1 3 # Per-CPU utilization
# High %iowait = disk bottleneck, not CPU
# 3. Memory
free -h # Available memory
vmstat 1 5 # si/so = swap in/out (should be 0)
# 4. Disk I/O
iostat -xz 1 3 # %util, await, avgqu-sz
# %util > 80% = saturated
# await > 10ms (SSD) or > 20ms (HDD) = slow
# 5. Network
sar -n DEV 1 3 # Interface throughput
ss -s # Connection counts
# 6. Per-process
pidstat -u -d -r 1 3 # CPU, disk, memory per process
# 7. Recent errors
dmesg -T | tail -20 # Kernel messages
journalctl -p err -b # Errors since boot
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting, Linux Fundamentals
- Environment Variables (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Linux Ops (Topic Pack, L0) — Bash / Shell Scripting, Linux Fundamentals
- Pipes & Redirection (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Process Management (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- RHCE (EX294) Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Regex & Text Wrangling (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Track: Foundations (Reference, L0) — Bash / Shell Scripting, Linux Fundamentals
Pages that link here¶
- Advanced Bash for Ops - Primer
- Bash - Skill Check
- Drills
- Environment Variables
- Environment Variables - Primer
- Foundations
- LPIC / LFCS Exam Preparation
- Linux Ops
- Linux Text Processing - Primer
- Make & Build Systems
- Make & Build Systems — Primer
- Master Curriculum: 40 Weeks
- Pipes & Redirection
- Pipes & Redirection - Primer
- Primer