- linux
- l1
- topic-pack
- linux-ops-systemd --- Portal | Level: L1: Foundations | Topics: Linux Ops systemd | Domain: Linux
Linux Ops: systemd - Primer¶
Why This Matters¶
systemd is PID 1 on every major Linux distribution. It controls which services start, in what order, how they restart on failure, and how resources are limited.
Fun fact: systemd was created by Lennart Poettering and Kay Sievers at Red Hat, first released in 2010. It replaced SysVinit (1983) and Upstart (2006). The name is intentionally lowercase — "system daemon." It was controversial because it replaced simple shell scripts with a complex binary system, but it won because of parallel boot, dependency resolution, and cgroup integration. By 2015, every major distro had adopted it.
Most engineers know systemctl start and stop. But
production incidents demand more: reading structured
logs, understanding dependency chains that cause
cascade failures, writing custom unit files, and
setting resource limits to prevent runaway processes.
Core Concepts¶
1. Unit Files¶
Everything in systemd is a unit. Main types:
| Type | Purpose | Example |
|---|---|---|
| service | Daemons and processes | nginx.service |
| timer | Scheduled execution | backup.timer |
| socket | Socket activation | cups.socket |
| target | Grouping/ordering | multi-user.target |
Unit file locations (highest priority first):
/etc/systemd/system/ # Admin overrides
/run/systemd/system/ # Runtime (transient)
/usr/lib/systemd/system/ # Vendor defaults
Never edit vendor files. Use systemctl edit <unit>
for drop-in overrides.
Remember: Unit file location priority: "ERC" — /Etc (admin overrides) > /Run (runtime transient) > /usr/lib (Core vendor). When you
systemctl edit nginx, it creates a drop-in at/etc/systemd/system/nginx.service.d/override.conf. This survives package upgrades because vendor updates only touch/usr/lib/. If you edit vendor files directly, your changes are overwritten on the next package update.
2. Essential systemctl Commands¶
systemctl start|stop|restart|reload nginx
systemctl enable --now nginx # Start + boot persist
systemctl status nginx # State + recent logs
systemctl is-active nginx # Quick health check
systemctl list-units --failed # All failed units
systemctl list-timers # Active timers
systemctl daemon-reload # After editing units
3. journalctl Log Queries¶
journalctl -u nginx -f # Follow logs
journalctl -u nginx -n 100 # Last 100 lines
journalctl -u nginx -b # Since boot
journalctl -u nginx -p err # Error+ priority
journalctl -u nginx \
--since "2024-01-15 10:00" \
--until "2024-01-15 11:00"
journalctl -k # Kernel messages
journalctl -o json --no-pager # JSON for scripting
journalctl --disk-usage # Log disk usage
journalctl --vacuum-time=7d # Prune old logs
4. Service Dependencies¶
[Unit]
After=network-online.target postgresql.service
Requires=postgresql.service
Wants=redis.service
| Directive | Meaning |
|---|---|
| After= | Start ordering (not dependency) |
| Requires= | Hard dep: if it fails, we fail |
| Wants= | Soft dep: if it fails, we continue |
| BindsTo= | Like Requires + stop when it stops |
After= is ordering only. Requires= is dependency
only. You almost always need both together.
Gotcha:
Requires=postgresql.servicewithoutAfter=postgresql.servicestarts both units simultaneously. Your app may start before PostgreSQL is ready, crash, and enter a restart loop. Always pairRequires=withAfter=for services that have startup-order dependencies. TheWants=+After=combo is preferred for soft dependencies where the dependency failing should not take down your service.
5. Restart Policies¶
| Restart= | When it restarts |
|---|---|
| no | Never (default) |
| always | Regardless of exit code |
| on-failure | Non-zero exit or signal |
| on-abnormal | Signal, timeout, or watchdog |
StartLimitBurst/StartLimitIntervalSec prevent
crash loops (e.g., 5 restarts in 300s = give up).
6. Resource Limits via cgroups¶
Under the hood: systemd uses Linux cgroups (control groups) v2 to enforce resource limits. Each service runs in its own cgroup at
/sys/fs/cgroup/system.slice/<service>.service/.MemoryMaxsets a hard limit — the OOM killer fires when exceeded.MemoryHighis a soft limit — the kernel aggressively reclaims memory but does not kill the process. UseMemoryHighas an early warning andMemoryMaxas the kill fence.
7. Timer Units vs Cron¶
Service unit (the job):
# /etc/systemd/system/backup.service
[Unit]
Description=Database Backup
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
Timer unit (the schedule):
# /etc/systemd/system/backup.timer
[Unit]
Description=Run backup daily at 2am
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=300
[Install]
WantedBy=timers.target
Advantages over cron: journalctl logging,
Persistent=true runs missed jobs, randomized delay
prevents thundering herd, resource limits apply.
Remember: Timer vs cron advantages mnemonic: "PLRR" — Persistent (runs missed jobs), Logging (journalctl built-in), Randomized delay (no thundering herd), Resource limits (cgroup controls). Cron has none of these. For new scheduled jobs, always prefer systemd timers.
8. Creating Custom Service Units¶
# /etc/systemd/system/myapp.service
[Unit]
Description=My Application Server
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/local/bin/myapp --config /etc/myapp.conf
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
User=myapp
Group=myapp
WorkingDirectory=/var/lib/myapp
EnvironmentFile=/etc/myapp/env
MemoryMax=1G
ProtectSystem=strict
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
9. Debugging Failed Services¶
Debug clue: When a service fails and
journalctl -u myappshows nothing useful, try running the exactExecStartcommand manually as the service user:sudo -u myapp /usr/local/bin/myapp --config /etc/myapp.conf. Services often fail because of permission issues, missing environment variables, or working directory problems that are invisible in the journal but obvious when run interactively.
systemctl status myapp # State + logs
journalctl -u myapp -b # Full boot logs
systemctl show myapp -p NRestarts # Crash loop?
systemd-analyze verify myapp.service # Syntax check
# Run manually as the service user:
sudo -u myapp /usr/local/bin/myapp --config /etc/myapp.conf
What Experienced People Know¶
systemctl daemon-reloadafter any manual unit file edit. Forgetting it is the most common mistake.Restart=alwayswithoutStartLimitBurstcreates infinite crash loops that flood logs.Type=forkingis legacy. Prefersimpleornotifyfor new services.- Use
ProtectSystem=strictandNoNewPrivileges=truefor free security hardening. systemd-analyze blameshows slow-starting services. Essential for boot optimization.- Check
TimeoutStopSec=if a service will not stop. Default is 90 seconds. ExecStartPre=runs before the main process. Use it for config validation or directory creation.- Set
SystemMaxUse=in/etc/systemd/journald.confto prevent /var/log from filling your disk.