systemd: The Init System You Can't Avoid

lesson
systemd
unit-files
journald
timers
cgroups
socket-activation
security-hardening ---# systemd — The Init System You Can't Avoid

Topics: systemd, unit files, journald, timers, cgroups, socket activation, security hardening Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)

The Mission¶

It's 7:14 AM. You're not even at your desk yet when PagerDuty fires: payment-processor is crash-looping. The service restarts, runs for 45 seconds, dies, restarts, runs for 45 seconds, dies. Payments are failing. The on-call engineer from the night shift left a note: "restarted it three times, seemed fine each time, went back to sleep."

Your job: figure out why it's crash-looping, stop the bleeding, and harden the service so this doesn't happen at 7 AM again. Along the way, you're going to learn systemd deeper than most engineers ever go — unit types, dependency ordering, journald forensics, timer units, socket activation, resource controls, and security sandboxing.

Part 1: The First 60 Seconds — Reading the Wreckage¶

Two commands before anything else:

systemctl status payment-processor

● payment-processor.service - Payment Processing Worker
     Loaded: loaded (/etc/systemd/system/payment-processor.service; enabled)
     Active: activating (auto-restart) (Result: exit-code)
    Process: 28491 ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf (code=exited, status=1/FAILURE)
   Main PID: 28491 (code=exited, status=1/FAILURE)
        CPU: 892ms

Three things jump out:

What you see	What it means
`activating (auto-restart)`	systemd is in the delay between crash and next restart
`code=exited, status=1/FAILURE`	Process exited with code 1 — not killed, it chose to exit
`CPU: 892ms`	It barely ran — something fails fast

Now the logs:

journalctl -u payment-processor --since "30 minutes ago" --no-pager

A repeating pattern every ~50 seconds:

07:13:22 payment-processor[28344]: Starting payment processor v2.4.1
07:13:22 payment-processor[28344]: Connecting to database at db-primary.internal:5432
07:13:22 payment-processor[28344]: Connected. Loading payment queue...
07:14:07 payment-processor[28344]: FATAL: database connection lost: SSL handshake timeout
07:14:07 systemd[1]: payment-processor.service: Main process exited, code=exited, status=1/FAILURE
07:14:07 systemd[1]: payment-processor.service: Failed with result 'exit-code'.
07:14:12 systemd[1]: payment-processor.service: Scheduled restart job, restart counter is at 14.

The service connects to the database, runs for 45 seconds, then the connection drops with an SSL timeout. systemd restarts it 5 seconds later, and the cycle repeats.

Mental Model: When debugging a restart loop, your first question is always: is the service crashing (exit code 1), being killed (signal 9/SIGKILL), or timing out? Exit code 1 = the application decided to die (check app logs). SIGKILL = something external killed it (check OOM killer, MemoryMax). Timeout = the process isn't stopping cleanly (check TimeoutStopSec).

The current unit file:

# /etc/systemd/system/payment-processor.service
[Unit]
Description=Payment Processing Worker
After=network.target

[Service]
Type=simple
ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf
Restart=always
RestartSec=5
User=payments
Group=payments

[Install]
WantedBy=multi-user.target

This unit file has problems. We'll fix them all by the end.

Part 2: Stop the Bleeding¶

The database team confirms: they're rotating SSL certificates on db-primary. Connections with the old cert are getting killed. The new cert will be ready in 20 minutes.

sudo systemctl stop payment-processor

This sticks even though the service has Restart=always. systemctl stop is an explicit admin action — systemd distinguishes "the process crashed" (triggers restart) from "an admin said stop" (obeys).

Gotcha: There is a case where stopping a service doesn't stick: socket activation. If payment-processor.socket exists, any incoming connection re-triggers the service. Always check: systemctl list-units 'payment-processor.*'

Part 3: Unit Types — Everything Is a Unit¶

Unit type	Suffix	What it does	Example
service	`.service`	A process or group of processes	`nginx.service`
socket	`.socket`	An IPC or network socket	`cups.socket`
timer	`.timer`	Triggers a service on a schedule	`logrotate.timer`
mount	`.mount`	A filesystem mount point	`var-log.mount`
target	`.target`	A group of units (like a runlevel)	`multi-user.target`
slice	`.slice`	A cgroup resource boundary	`user.slice`

For daily ops you'll use services (90%), timers (replacing cron), and occasionally sockets.

Where unit files live matters:

/etc/systemd/system/       → Admin overrides (highest priority)
/run/systemd/system/       → Runtime/transient units (ephemeral)
/usr/lib/systemd/system/   → Vendor defaults (lowest priority)

Remember: Priority mnemonic: ERC — Etc, Run, usr/lib (Core). Never edit files in /usr/lib/ — package updates overwrite them. Use systemctl edit <unit>.

Part 4: Rewriting the Unit File¶

Here's the hardened version:

# /etc/systemd/system/payment-processor.service
[Unit]
Description=Payment Processing Worker
After=network-online.target postgresql.service
Wants=network-online.target
Requires=postgresql.service

[Service]
Type=notify
ExecStartPre=/opt/payments/bin/processor --validate-config /etc/payments/app.conf
ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=300
StartLimitBurst=5

User=payments
Group=payments
WorkingDirectory=/opt/payments
EnvironmentFile=/etc/payments/env

# Resource controls
MemoryMax=1G
MemoryHigh=768M
CPUQuota=200%
TasksMax=256
LimitNOFILE=65536

# Security hardening
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
NoNewPrivileges=true
ReadWritePaths=/var/lib/payments /var/log/payments
ProtectKernelTunables=true
ProtectKernelModules=true
RestrictSUIDSGID=true

[Install]
WantedBy=multi-user.target

Let's break this into groups.

Dependencies¶

After=network-online.target postgresql.service
Wants=network-online.target
Requires=postgresql.service

The original had After=network.target. That's a trap.

Target	What it means
`network.target`	Network interfaces are configured
`network-online.target`	Network is actually reachable

Gotcha: After=network.target is the #1 cause of "works on restart, fails on boot." The service starts before the network is up. Use network-online.target with Wants=network-online.target (it isn't pulled in by default).

Under the Hood: After= controls ordering. Requires= controls dependency. They're orthogonal. Requires= without After= starts both simultaneously — your app races against its database. After= without Requires= waits for it if it's starting, but doesn't pull it in. You almost always want both together.

Service type and startup¶

Type=notify means the service explicitly tells systemd when it's ready by calling sd_notify("READY=1"). With Type=simple (the original), systemd considers it "started" instantly after fork() — health checks and dependent services don't wait for real readiness.

ExecStartPre= validates the config before starting. If invalid, you get a clear error instead of a crash 10 seconds later.

Type= value	When systemd considers it "started"	Best for
`simple`	Immediately after fork()	Scripts, most binaries
`exec`	After the binary successfully exec()'s	Catching missing-binary errors
`notify`	When the service calls sd_notify()	Apps with startup initialization
`forking`	When the parent process exits	Legacy daemons that double-fork
`oneshot`	When the process exits	Scripts that run and finish

War Story: A team set Type=simple for a Java service that took 30 seconds to initialize its connection pool. The load balancer started sending traffic immediately. Every deploy caused 30 seconds of 503 errors. Switching to Type=notify fixed it — the load balancer didn't get traffic until the connection pool was warm.

Restart policy — the 10-second default that bites everyone¶

Policy	Restarts on...	Doesn't restart on...
`always`	Everything: clean exit, error, signal	Explicit `systemctl stop`
`on-failure`	Non-zero exit, signal death, timeout	Clean exit (code 0), `systemctl stop`
`on-abnormal`	Signal, timeout, watchdog	Any exit code (even non-zero)

War Story: The default StartLimitIntervalSec is 10 seconds and StartLimitBurst is 5. With RestartSec=0, a service that crashes on startup hits this limit in under a second. The service enters "failed" state and you get: Start request repeated too quickly. systemctl start refuses. The fix is systemctl reset-failed <unit>, then fix the real problem. But the real fix is setting RestartSec=5 or higher so crash loops never trigger the rate limit.

Flashcard check — dependencies and lifecycle¶

Question	Cover the answer, then check
What's the difference between `network.target` and `network-online.target`?	`network.target` = interfaces configured. `network-online.target` = network reachable. Use the latter for outbound connections.
Why pair `Requires=` with `After=`?	`Requires=` = must be running. `After=` = start after. Without `After=`, both start simultaneously.
What happens when `StartLimitBurst` is exceeded?	The unit enters "failed" state. Clear with `systemctl reset-failed`.
`Type=simple` vs `Type=notify`?	`simple`: started on fork(). `notify`: started when service calls sd_notify("READY=1").

Part 5: Resource Controls — cgroups You Didn't Know You Were Using¶

Every systemd service runs inside a cgroup (control group). You can see the hierarchy:

systemd-cgls

Under the Hood: cgroups v2 uses a single unified hierarchy. systemd was the driving force behind cgroups v2 — Lennart Poettering was one of the strongest advocates. Each service's cgroup lives at /sys/fs/cgroup/system.slice/<service>.service/. Read raw values directly: cat /sys/fs/cgroup/system.slice/payment-processor.service/memory.current

Directive	What it does	On exceed
`MemoryMax=1G`	Hard memory ceiling	cgroup OOM killer fires (SIGKILL)
`MemoryHigh=768M`	Soft memory throttle	Kernel slows allocations
`CPUQuota=200%`	CPU limit (200% = 2 cores)	Throttled, not killed
`TasksMax=256`	Max threads/processes	Fork fails with EAGAIN
`LimitNOFILE=65536`	Max open file descriptors	open() fails with EMFILE

The killer combination is MemoryHigh + MemoryMax. Think of it as a warning track and a wall. At 768M the kernel throttles — the process slows but lives. At 1G the OOM killer fires. This gives you a window to notice before the process dies.

# Current memory usage
systemctl show payment-processor -p MemoryCurrent

# Live cgroup resource monitor
systemd-cgtop

Gotcha: The system can have 32 GB free and your service still gets OOM-killed. MemoryMax is cgroup-scoped — it doesn't care about system-wide memory. Always check MemoryMax before investigating system memory pressure.

Part 6: Security Hardening — Free Protection¶

Directive	What it does
`ProtectSystem=strict`	Mounts filesystem read-only except `ReadWritePaths=`
`ProtectHome=true`	`/home`, `/root` inaccessible
`PrivateTmp=true`	Isolated `/tmp` per service
`NoNewPrivileges=true`	No privilege escalation (no setuid, no capabilities)
`ProtectKernelTunables=true`	`/proc/sys/`, `/sys/` read-only
`ProtectKernelModules=true`	Block loading kernel modules
`RestrictSUIDSGID=true`	Prevent creating setuid/setgid files

If the payment processor gets compromised, the attacker can't write to the filesystem (except two directories), can't read home directories, can't escalate privileges, and can't load kernel modules. All for free.

# Security audit with numeric score (lower is better)
systemd-analyze security payment-processor

Gotcha: ProtectSystem=strict with no ReadWritePaths= means the service can't write anywhere. The error in the journal is often just "Permission denied" — looks like a user/group problem. Exit code 226/NAMESPACE in systemctl status is the telltale sign that sandboxing directives failed.

Part 7: journald Deep Dive¶

The journal saved us this morning. Let's go deeper.

Structured fields — the killer feature¶

journald stores entries as structured data, not text. Every entry has machine-readable fields:

journalctl -u payment-processor -o json-pretty -n 1

{
    "_PID": "29104",
    "_UID": "997",
    "_COMM": "processor",
    "_SYSTEMD_UNIT": "payment-processor.service",
    "MESSAGE": "Connected. Loading payment queue...",
    "PRIORITY": "6"
}

These fields are searchable:

# Every log line from PID 29104
journalctl _PID=29104

# Every log line from UID 997 across all services
journalctl _UID=997

Trivia: journald's binary log format was one of the most controversial systemd decisions. Critics: you can't cat and grep your logs. Supporters: structured binary enables indexed searching, integrity verification, and fields that text can't represent. The debate helped spawn Devuan (Debian without systemd, 2014).

Patterns you'll actually use¶

journalctl -u payment-processor -f                      # Follow live
journalctl -u payment-processor -p err --since "1h ago"  # Recent errors
journalctl -b -1                                         # Previous boot logs
journalctl -k                                            # Kernel messages (like dmesg)
journalctl --disk-usage                                  # Journal disk space
journalctl --vacuum-time=7d                              # Prune old entries

Persistent vs volatile storage¶

By default, journald stores logs in /run/log/journal/ (tmpfs — gone on reboot). For persistence, create /var/log/journal/ or set Storage=persistent in /etc/systemd/journald.conf.

Gotcha: Without size limits, persistent storage eats your /var partition. Always set:
# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=2G
SystemMaxFileSize=256M
MaxRetentionSec=30day

Part 8: Socket Activation — Why It's Elegant¶

Traditional startup: systemd starts a service, the service opens a socket, clients connect. Socket activation: systemd opens the socket first, queues connections, starts the service on first connection, and passes the open fd via $LISTEN_FDS.

# /etc/systemd/system/myapi.socket
[Unit]
Description=My API Socket
[Socket]
ListenStream=8080
Accept=no
[Install]
WantedBy=sockets.target

# /etc/systemd/system/myapi.service
[Unit]
Description=My API Server
Requires=myapi.socket
[Service]
Type=notify
ExecStart=/opt/myapi/bin/server

Why this is elegant:

Zero-downtime restart. systemd holds the socket during service restart. No dropped connections.
On-demand startup. Rarely-used services start only when someone connects. Faster boot, less memory.
Implicit dependency resolution. Service A connects to B's socket before B is running. The connection queues. B starts, inherits the socket, completes the connection.

Trivia: Socket activation was inspired by Apple's launchd (2005, macOS). The idea of passing open file descriptors from a supervisor to a service dates back to inetd (1985), the original Unix "internet super-server." systemd's version is inetd's idea scaled to manage an entire operating system.

Part 9: Timers — Replacing Cron¶

You notice a crontab entry on this server:

0 */6 * * * /opt/payments/bin/cleanup-stale --older-than 24h >> /var/log/payments/cleanup.log 2>&1

No overlap prevention, logs go to a file, no resource limits, no missed-run recovery. Let's replace it.

# /etc/systemd/system/payment-cleanup.service
[Unit]
Description=Clean up stale payment records
After=postgresql.service
[Service]
Type=oneshot
ExecStart=/opt/payments/bin/cleanup-stale --older-than 24h
User=payments
Group=payments
MemoryMax=512M

# /etc/systemd/system/payment-cleanup.timer
[Unit]
Description=Run payment cleanup every 6 hours
[Timer]
OnCalendar=*-*-* 00/6:00:00
Persistent=true
RandomizedDelaySec=300
[Install]
WantedBy=timers.target

sudo systemctl daemon-reload
sudo systemctl enable --now payment-cleanup.timer
systemd-analyze calendar "*-*-* 00/6:00:00"  # Validate the schedule

Feature	cron	systemd timer
Logging	Redirect to file or email	Automatic via journald
Missed runs	Lost forever	`Persistent=true` runs on next boot
Overlap prevention	Requires `flock` wrapper	Automatic (oneshot type)
Resource limits	None	`MemoryMax`, `CPUQuota`, etc.
Fleet load spread	Not possible	`RandomizedDelaySec` adds jitter

Remember: Timer advantages — PLRR: Persistent (missed runs recovered), Logging (journald), Randomized delay (no thundering herd), Resource limits.

Part 10: Transient Units and Boot Analysis¶

The database SSL rotation is done. Before restarting the service, test connectivity with a transient unit — a one-off command with full cgroup isolation:

sudo systemd-run \
  --unit=db-connectivity-test \
  --property=MemoryMax=256M \
  --property=User=payments \
  /opt/payments/bin/processor --test-db-connection

The transient unit disappears when the process exits. Logs stay in the journal.

While you're on this server, check boot time:

systemd-analyze                              # Total boot time
systemd-analyze blame | head -5              # Slowest units
systemd-analyze critical-chain payment-processor.service  # Dependency chain

payment-processor.service +301ms
└─postgresql.service @4.112s +412ms
  └─network-online.target @4.001s
    └─NetworkManager-wait-online.service @1.667s +2.334s

The 2.3-second NetworkManager-wait-online is the real bottleneck. Now you know where to look if boot time becomes a problem.

Part 11: Restart and Verify¶

sudo systemctl daemon-reload
sudo systemctl restart payment-processor

systemctl status payment-processor                        # Running?
journalctl -u payment-processor -f                        # Watch startup
systemctl show payment-processor -p NRestarts              # Should be 0
systemctl show payment-processor -p MemoryMax,MemoryHigh   # Limits applied?
systemd-analyze security payment-processor                 # Security score

Payments are flowing. The restart loop is gone. The service is hardened.

Flashcard Check — Part 2¶

Question	Cover the answer, then check
What does `ProtectSystem=strict` do?	Mounts filesystem read-only except `ReadWritePaths=`. Exit code `226/NAMESPACE` = sandboxing failure.
`MemoryHigh` vs `MemoryMax`?	`MemoryHigh` = soft throttle. `MemoryMax` = hard kill. Use both for graduated response.
What does `Persistent=true` do in a timer?	Runs the job on next boot if it was missed. Cron can't do this.
How do you create a one-off supervised command?	`systemd-run --property=MemoryMax=256M ./cmd` — creates a transient unit.
`mask` vs `disable`?	`disable` removes boot symlink. `mask` symlinks to `/dev/null` — blocks starting by any means.

Exercises¶

Exercise 1: Read a crash loop (2 minutes)¶

systemctl list-units --failed

Pick a failed unit. Run systemctl status <unit> and journalctl -u <unit> -n 30. Can you identify the exit code and root cause?

What to look for

Common exit codes: `203/EXEC` (binary not found), `217/USER` (user doesn't exist), `226/NAMESPACE` (sandboxing failed), `1/FAILURE` (generic app error — check app logs).

Exercise 2: Write a timer (10 minutes)¶

Replace this crontab with a systemd timer:

*/15 * * * * root /usr/local/bin/check-disk-space.sh >> /var/log/disk-check.log 2>&1

Requirements: Type=oneshot, persistent, 60-second random delay, 128M memory limit.

Solution

# /etc/systemd/system/disk-check.service
[Unit]
Description=Check disk space
[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-disk-space.sh
MemoryMax=128M

# /etc/systemd/system/disk-check.timer
[Unit]
Description=Check disk space every 15 minutes
[Timer]
OnCalendar=*:0/15
Persistent=true
RandomizedDelaySec=60
[Install]
WantedBy=timers.target

sudo systemctl daemon-reload
sudo systemctl enable --now disk-check.timer

Exercise 3: Security audit (15 minutes)¶

Pick three services and check their security scores:

for svc in sshd nginx postgresql; do
  echo "=== $svc ==="
  systemd-analyze security "$svc" 2>/dev/null | tail -1
done

Which scores worst? Write a drop-in override adding ProtectSystem=strict, NoNewPrivileges=true, and PrivateTmp=true. Restart. Does it still work? If not, what ReadWritePaths= does it need?

Cheat Sheet¶

Service lifecycle¶

Command	Effect
`systemctl start/stop/restart <unit>`	Control running state
`systemctl reload <unit>`	Send SIGHUP (re-read config, no downtime)
`systemctl enable --now <unit>`	Start now + start on boot
`systemctl mask <unit>`	Prevent starting by any means
`systemctl daemon-reload`	Re-read unit files from disk
`systemctl reset-failed <unit>`	Clear "failed" state

Diagnostics¶

Command	Shows
`systemctl status <unit>`	State, PID, memory, recent logs
`systemctl cat <unit>`	Effective unit file with overrides
`systemctl list-units --failed`	All failed units
`journalctl -u <unit> -f`	Live logs
`journalctl -u <unit> -p err --since "1h ago"`	Recent errors
`journalctl -b -1`	Previous boot
`systemd-analyze security <unit>`	Security score
`systemd-analyze blame`	Slowest boot units
`systemd-cgtop`	Live cgroup resource monitor

Resource directives¶

Directive	On exceed
`MemoryMax=`	SIGKILL (OOM)
`MemoryHigh=`	Kernel throttles
`CPUQuota=`	Throttled
`TasksMax=`	Fork fails
`RuntimeMaxSec=`	Graceful restart

Security directives¶

Directive	Effect
`ProtectSystem=strict`	Filesystem read-only except `ReadWritePaths=`
`ProtectHome=true`	`/home`, `/root` inaccessible
`PrivateTmp=true`	Isolated `/tmp`
`NoNewPrivileges=true`	No privilege escalation

Takeaways¶

systemctl status first, journalctl second. The exit code narrows the problem space before you start reading logs.
Requires= needs After=. Dependency without ordering means simultaneous startup. Always pair them.
MemoryHigh + MemoryMax = graduated response. Soft throttle before hard kill. Never use MemoryMax alone.
Timers over cron, always. Persistent missed-run recovery, journal logging, resource limits, random delay. No good reason for new cron jobs.
Security hardening is free. ProtectSystem=strict, NoNewPrivileges=true, PrivateTmp=true on every service. Fix the ReadWritePaths= errors that follow.
daemon-reload after every unit file change. This will bite you exactly once.

The Hanging Deploy — processes, signals, systemd stop behavior, and TimeoutStopSec
From Init Scripts to systemd — SysV init to Upstart to systemd, and why the controversy matters
The Disk That Filled Up — journald storage limits, log rotation, the /var disaster
Out of Memory — cgroup OOM vs system OOM, the OOM killer's scoring algorithm