Lab 5: Shell Scripting¶

Field	Value
Tier	1 — Foundations
Estimated Time	30 minutes
Prerequisites	Bash basics
Auto-Grade	Yes

Scenario¶

Your team runs five critical services on a bare-metal server: a web server, an API gateway, a cache (Redis), a database (PostgreSQL), and a message queue (RabbitMQ). Currently, there is no monitoring — someone just checks manually every few hours. Last week, the cache went down for six hours before anyone noticed. Customers complained about slow response times and your VP wants answers.

You have been asked to write a monitoring script that runs every minute via cron. The script must check each service, log results, and send an alert (write to an alert file) when any service is down. It should also track uptime percentages and generate a daily summary. The script needs to be production-grade: proper error handling, log rotation awareness, and clean exit codes.

Objectives¶

Create monitor.sh that checks 5 services via their health endpoints
Script logs results to /tmp/lab-shell/logs/monitor.log with timestamps
Script writes to /tmp/lab-shell/alerts/active.txt when a service is down
Script clears alerts when a previously-down service recovers
Script exits 0 when all services healthy, 1 when any service is down
Script handles missing directories gracefully (creates them if needed)
Script is idempotent (safe to run multiple times)

Setup¶

./setup.sh

Creates simulated service health endpoints under /tmp/lab-shell/services/.

Hints¶

Hint 1: Checking service health

Each service has a health file in `/tmp/lab-shell/services//health`. If the file contains "ok", the service is healthy. If it contains anything else or is missing, the service is down.

Hint 2: Timestamps in bash

Use `date '+%Y-%m-%d %H:%M:%S'` for ISO-like timestamps. Combine with service status: `echo "$(date '+%Y-%m-%d %H:%M:%S') [OK] web-server"`.

Hint 3: Alert management

Write down services to an alert file, one per line. On each run, rebuild the alert file from scratch based on current status. If all services are healthy, the file should be empty or removed.

Hint 4: Exit codes

Track a `failures` counter. Increment it for each unhealthy service. At the end: `exit $(( failures > 0 ? 1 : 0 ))`.

Hint 5: set -euo pipefail

Always start with `set -euo pipefail` for production scripts. Use `|| true` for commands that are expected to fail (like checking a file that might not exist).

Grading¶

./grade.sh

Solution¶

See the solution/ directory for a complete monitoring script.