Lab 5: Shell Scripting¶
| Field | Value |
|---|---|
| Tier | 1 — Foundations |
| Estimated Time | 30 minutes |
| Prerequisites | Bash basics |
| Auto-Grade | Yes |
Scenario¶
Your team runs five critical services on a bare-metal server: a web server, an API gateway, a cache (Redis), a database (PostgreSQL), and a message queue (RabbitMQ). Currently, there is no monitoring — someone just checks manually every few hours. Last week, the cache went down for six hours before anyone noticed. Customers complained about slow response times and your VP wants answers.
You have been asked to write a monitoring script that runs every minute via cron. The script must check each service, log results, and send an alert (write to an alert file) when any service is down. It should also track uptime percentages and generate a daily summary. The script needs to be production-grade: proper error handling, log rotation awareness, and clean exit codes.
Objectives¶
- Create
monitor.shthat checks 5 services via their health endpoints - Script logs results to
/tmp/lab-shell/logs/monitor.logwith timestamps - Script writes to
/tmp/lab-shell/alerts/active.txtwhen a service is down - Script clears alerts when a previously-down service recovers
- Script exits 0 when all services healthy, 1 when any service is down
- Script handles missing directories gracefully (creates them if needed)
- Script is idempotent (safe to run multiple times)
Setup¶
Creates simulated service health endpoints under /tmp/lab-shell/services/.
Hints¶
Hint 1: Checking service health
Each service has a health file in `/tmp/lab-shell/services/Hint 2: Timestamps in bash
Use `date '+%Y-%m-%d %H:%M:%S'` for ISO-like timestamps. Combine with service status: `echo "$(date '+%Y-%m-%d %H:%M:%S') [OK] web-server"`.Hint 3: Alert management
Write down services to an alert file, one per line. On each run, rebuild the alert file from scratch based on current status. If all services are healthy, the file should be empty or removed.Hint 4: Exit codes
Track a `failures` counter. Increment it for each unhealthy service. At the end: `exit $(( failures > 0 ? 1 : 0 ))`.Hint 5: set -euo pipefail
Always start with `set -euo pipefail` for production scripts. Use `|| true` for commands that are expected to fail (like checking a file that might not exist).Grading¶
Solution¶
See the solution/ directory for a complete monitoring script.