Skip to content

systemd: The Init System You Can't Avoid

  • lesson
  • systemd
  • unit-files
  • journald
  • timers
  • cgroups
  • socket-activation
  • security-hardening ---# systemd — The Init System You Can't Avoid

Topics: systemd, unit files, journald, timers, cgroups, socket activation, security hardening Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (everything is explained from scratch)


The Mission

It's 7:14 AM. You're not even at your desk yet when PagerDuty fires: payment-processor is crash-looping. The service restarts, runs for 45 seconds, dies, restarts, runs for 45 seconds, dies. Payments are failing. The on-call engineer from the night shift left a note: "restarted it three times, seemed fine each time, went back to sleep."

Your job: figure out why it's crash-looping, stop the bleeding, and harden the service so this doesn't happen at 7 AM again. Along the way, you're going to learn systemd deeper than most engineers ever go — unit types, dependency ordering, journald forensics, timer units, socket activation, resource controls, and security sandboxing.


Part 1: The First 60 Seconds — Reading the Wreckage

Two commands before anything else:

systemctl status payment-processor
● payment-processor.service - Payment Processing Worker
     Loaded: loaded (/etc/systemd/system/payment-processor.service; enabled)
     Active: activating (auto-restart) (Result: exit-code)
    Process: 28491 ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf (code=exited, status=1/FAILURE)
   Main PID: 28491 (code=exited, status=1/FAILURE)
        CPU: 892ms

Three things jump out:

What you see What it means
activating (auto-restart) systemd is in the delay between crash and next restart
code=exited, status=1/FAILURE Process exited with code 1 — not killed, it chose to exit
CPU: 892ms It barely ran — something fails fast

Now the logs:

journalctl -u payment-processor --since "30 minutes ago" --no-pager

A repeating pattern every ~50 seconds:

07:13:22 payment-processor[28344]: Starting payment processor v2.4.1
07:13:22 payment-processor[28344]: Connecting to database at db-primary.internal:5432
07:13:22 payment-processor[28344]: Connected. Loading payment queue...
07:14:07 payment-processor[28344]: FATAL: database connection lost: SSL handshake timeout
07:14:07 systemd[1]: payment-processor.service: Main process exited, code=exited, status=1/FAILURE
07:14:07 systemd[1]: payment-processor.service: Failed with result 'exit-code'.
07:14:12 systemd[1]: payment-processor.service: Scheduled restart job, restart counter is at 14.

The service connects to the database, runs for 45 seconds, then the connection drops with an SSL timeout. systemd restarts it 5 seconds later, and the cycle repeats.

Mental Model: When debugging a restart loop, your first question is always: is the service crashing (exit code 1), being killed (signal 9/SIGKILL), or timing out? Exit code 1 = the application decided to die (check app logs). SIGKILL = something external killed it (check OOM killer, MemoryMax). Timeout = the process isn't stopping cleanly (check TimeoutStopSec).

The current unit file:

# /etc/systemd/system/payment-processor.service
[Unit]
Description=Payment Processing Worker
After=network.target

[Service]
Type=simple
ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf
Restart=always
RestartSec=5
User=payments
Group=payments

[Install]
WantedBy=multi-user.target

This unit file has problems. We'll fix them all by the end.


Part 2: Stop the Bleeding

The database team confirms: they're rotating SSL certificates on db-primary. Connections with the old cert are getting killed. The new cert will be ready in 20 minutes.

sudo systemctl stop payment-processor

This sticks even though the service has Restart=always. systemctl stop is an explicit admin action — systemd distinguishes "the process crashed" (triggers restart) from "an admin said stop" (obeys).

Gotcha: There is a case where stopping a service doesn't stick: socket activation. If payment-processor.socket exists, any incoming connection re-triggers the service. Always check: systemctl list-units 'payment-processor.*'


Part 3: Unit Types — Everything Is a Unit

Unit type Suffix What it does Example
service .service A process or group of processes nginx.service
socket .socket An IPC or network socket cups.socket
timer .timer Triggers a service on a schedule logrotate.timer
mount .mount A filesystem mount point var-log.mount
target .target A group of units (like a runlevel) multi-user.target
slice .slice A cgroup resource boundary user.slice

For daily ops you'll use services (90%), timers (replacing cron), and occasionally sockets.

Where unit files live matters:

/etc/systemd/system/       → Admin overrides (highest priority)
/run/systemd/system/       → Runtime/transient units (ephemeral)
/usr/lib/systemd/system/   → Vendor defaults (lowest priority)

Remember: Priority mnemonic: ERCEtc, Run, usr/lib (Core). Never edit files in /usr/lib/ — package updates overwrite them. Use systemctl edit <unit>.


Part 4: Rewriting the Unit File

Here's the hardened version:

# /etc/systemd/system/payment-processor.service
[Unit]
Description=Payment Processing Worker
After=network-online.target postgresql.service
Wants=network-online.target
Requires=postgresql.service

[Service]
Type=notify
ExecStartPre=/opt/payments/bin/processor --validate-config /etc/payments/app.conf
ExecStart=/opt/payments/bin/processor --config /etc/payments/app.conf
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=300
StartLimitBurst=5

User=payments
Group=payments
WorkingDirectory=/opt/payments
EnvironmentFile=/etc/payments/env

# Resource controls
MemoryMax=1G
MemoryHigh=768M
CPUQuota=200%
TasksMax=256
LimitNOFILE=65536

# Security hardening
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
NoNewPrivileges=true
ReadWritePaths=/var/lib/payments /var/log/payments
ProtectKernelTunables=true
ProtectKernelModules=true
RestrictSUIDSGID=true

[Install]
WantedBy=multi-user.target

Let's break this into groups.

Dependencies

After=network-online.target postgresql.service
Wants=network-online.target
Requires=postgresql.service

The original had After=network.target. That's a trap.

Target What it means
network.target Network interfaces are configured
network-online.target Network is actually reachable

Gotcha: After=network.target is the #1 cause of "works on restart, fails on boot." The service starts before the network is up. Use network-online.target with Wants=network-online.target (it isn't pulled in by default).

Under the Hood: After= controls ordering. Requires= controls dependency. They're orthogonal. Requires= without After= starts both simultaneously — your app races against its database. After= without Requires= waits for it if it's starting, but doesn't pull it in. You almost always want both together.

Service type and startup

Type=notify means the service explicitly tells systemd when it's ready by calling sd_notify("READY=1"). With Type=simple (the original), systemd considers it "started" instantly after fork() — health checks and dependent services don't wait for real readiness.

ExecStartPre= validates the config before starting. If invalid, you get a clear error instead of a crash 10 seconds later.

Type= value When systemd considers it "started" Best for
simple Immediately after fork() Scripts, most binaries
exec After the binary successfully exec()'s Catching missing-binary errors
notify When the service calls sd_notify() Apps with startup initialization
forking When the parent process exits Legacy daemons that double-fork
oneshot When the process exits Scripts that run and finish

War Story: A team set Type=simple for a Java service that took 30 seconds to initialize its connection pool. The load balancer started sending traffic immediately. Every deploy caused 30 seconds of 503 errors. Switching to Type=notify fixed it — the load balancer didn't get traffic until the connection pool was warm.

Restart policy — the 10-second default that bites everyone

Policy Restarts on... Doesn't restart on...
always Everything: clean exit, error, signal Explicit systemctl stop
on-failure Non-zero exit, signal death, timeout Clean exit (code 0), systemctl stop
on-abnormal Signal, timeout, watchdog Any exit code (even non-zero)

War Story: The default StartLimitIntervalSec is 10 seconds and StartLimitBurst is 5. With RestartSec=0, a service that crashes on startup hits this limit in under a second. The service enters "failed" state and you get: Start request repeated too quickly. systemctl start refuses. The fix is systemctl reset-failed <unit>, then fix the real problem. But the real fix is setting RestartSec=5 or higher so crash loops never trigger the rate limit.

Flashcard check — dependencies and lifecycle

Question Cover the answer, then check
What's the difference between network.target and network-online.target? network.target = interfaces configured. network-online.target = network reachable. Use the latter for outbound connections.
Why pair Requires= with After=? Requires= = must be running. After= = start after. Without After=, both start simultaneously.
What happens when StartLimitBurst is exceeded? The unit enters "failed" state. Clear with systemctl reset-failed.
Type=simple vs Type=notify? simple: started on fork(). notify: started when service calls sd_notify("READY=1").

Part 5: Resource Controls — cgroups You Didn't Know You Were Using

Every systemd service runs inside a cgroup (control group). You can see the hierarchy:

systemd-cgls

Under the Hood: cgroups v2 uses a single unified hierarchy. systemd was the driving force behind cgroups v2 — Lennart Poettering was one of the strongest advocates. Each service's cgroup lives at /sys/fs/cgroup/system.slice/<service>.service/. Read raw values directly: cat /sys/fs/cgroup/system.slice/payment-processor.service/memory.current

Directive What it does On exceed
MemoryMax=1G Hard memory ceiling cgroup OOM killer fires (SIGKILL)
MemoryHigh=768M Soft memory throttle Kernel slows allocations
CPUQuota=200% CPU limit (200% = 2 cores) Throttled, not killed
TasksMax=256 Max threads/processes Fork fails with EAGAIN
LimitNOFILE=65536 Max open file descriptors open() fails with EMFILE

The killer combination is MemoryHigh + MemoryMax. Think of it as a warning track and a wall. At 768M the kernel throttles — the process slows but lives. At 1G the OOM killer fires. This gives you a window to notice before the process dies.

# Current memory usage
systemctl show payment-processor -p MemoryCurrent

# Live cgroup resource monitor
systemd-cgtop

Gotcha: The system can have 32 GB free and your service still gets OOM-killed. MemoryMax is cgroup-scoped — it doesn't care about system-wide memory. Always check MemoryMax before investigating system memory pressure.


Part 6: Security Hardening — Free Protection

Directive What it does
ProtectSystem=strict Mounts filesystem read-only except ReadWritePaths=
ProtectHome=true /home, /root inaccessible
PrivateTmp=true Isolated /tmp per service
NoNewPrivileges=true No privilege escalation (no setuid, no capabilities)
ProtectKernelTunables=true /proc/sys/, /sys/ read-only
ProtectKernelModules=true Block loading kernel modules
RestrictSUIDSGID=true Prevent creating setuid/setgid files

If the payment processor gets compromised, the attacker can't write to the filesystem (except two directories), can't read home directories, can't escalate privileges, and can't load kernel modules. All for free.

# Security audit with numeric score (lower is better)
systemd-analyze security payment-processor

Gotcha: ProtectSystem=strict with no ReadWritePaths= means the service can't write anywhere. The error in the journal is often just "Permission denied" — looks like a user/group problem. Exit code 226/NAMESPACE in systemctl status is the telltale sign that sandboxing directives failed.


Part 7: journald Deep Dive

The journal saved us this morning. Let's go deeper.

Structured fields — the killer feature

journald stores entries as structured data, not text. Every entry has machine-readable fields:

journalctl -u payment-processor -o json-pretty -n 1
{
    "_PID": "29104",
    "_UID": "997",
    "_COMM": "processor",
    "_SYSTEMD_UNIT": "payment-processor.service",
    "MESSAGE": "Connected. Loading payment queue...",
    "PRIORITY": "6"
}

These fields are searchable:

# Every log line from PID 29104
journalctl _PID=29104

# Every log line from UID 997 across all services
journalctl _UID=997

Trivia: journald's binary log format was one of the most controversial systemd decisions. Critics: you can't cat and grep your logs. Supporters: structured binary enables indexed searching, integrity verification, and fields that text can't represent. The debate helped spawn Devuan (Debian without systemd, 2014).

Patterns you'll actually use

journalctl -u payment-processor -f                      # Follow live
journalctl -u payment-processor -p err --since "1h ago"  # Recent errors
journalctl -b -1                                         # Previous boot logs
journalctl -k                                            # Kernel messages (like dmesg)
journalctl --disk-usage                                  # Journal disk space
journalctl --vacuum-time=7d                              # Prune old entries

Persistent vs volatile storage

By default, journald stores logs in /run/log/journal/ (tmpfs — gone on reboot). For persistence, create /var/log/journal/ or set Storage=persistent in /etc/systemd/journald.conf.

Gotcha: Without size limits, persistent storage eats your /var partition. Always set:

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=2G
SystemMaxFileSize=256M
MaxRetentionSec=30day


Part 8: Socket Activation — Why It's Elegant

Traditional startup: systemd starts a service, the service opens a socket, clients connect. Socket activation: systemd opens the socket first, queues connections, starts the service on first connection, and passes the open fd via $LISTEN_FDS.

# /etc/systemd/system/myapi.socket
[Unit]
Description=My API Socket
[Socket]
ListenStream=8080
Accept=no
[Install]
WantedBy=sockets.target
# /etc/systemd/system/myapi.service
[Unit]
Description=My API Server
Requires=myapi.socket
[Service]
Type=notify
ExecStart=/opt/myapi/bin/server

Why this is elegant:

  1. Zero-downtime restart. systemd holds the socket during service restart. No dropped connections.
  2. On-demand startup. Rarely-used services start only when someone connects. Faster boot, less memory.
  3. Implicit dependency resolution. Service A connects to B's socket before B is running. The connection queues. B starts, inherits the socket, completes the connection.

Trivia: Socket activation was inspired by Apple's launchd (2005, macOS). The idea of passing open file descriptors from a supervisor to a service dates back to inetd (1985), the original Unix "internet super-server." systemd's version is inetd's idea scaled to manage an entire operating system.


Part 9: Timers — Replacing Cron

You notice a crontab entry on this server:

0 */6 * * * /opt/payments/bin/cleanup-stale --older-than 24h >> /var/log/payments/cleanup.log 2>&1

No overlap prevention, logs go to a file, no resource limits, no missed-run recovery. Let's replace it.

# /etc/systemd/system/payment-cleanup.service
[Unit]
Description=Clean up stale payment records
After=postgresql.service
[Service]
Type=oneshot
ExecStart=/opt/payments/bin/cleanup-stale --older-than 24h
User=payments
Group=payments
MemoryMax=512M
# /etc/systemd/system/payment-cleanup.timer
[Unit]
Description=Run payment cleanup every 6 hours
[Timer]
OnCalendar=*-*-* 00/6:00:00
Persistent=true
RandomizedDelaySec=300
[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable --now payment-cleanup.timer
systemd-analyze calendar "*-*-* 00/6:00:00"  # Validate the schedule
Feature cron systemd timer
Logging Redirect to file or email Automatic via journald
Missed runs Lost forever Persistent=true runs on next boot
Overlap prevention Requires flock wrapper Automatic (oneshot type)
Resource limits None MemoryMax, CPUQuota, etc.
Fleet load spread Not possible RandomizedDelaySec adds jitter

Remember: Timer advantages — PLRR: Persistent (missed runs recovered), Logging (journald), Randomized delay (no thundering herd), Resource limits.


Part 10: Transient Units and Boot Analysis

The database SSL rotation is done. Before restarting the service, test connectivity with a transient unit — a one-off command with full cgroup isolation:

sudo systemd-run \
  --unit=db-connectivity-test \
  --property=MemoryMax=256M \
  --property=User=payments \
  /opt/payments/bin/processor --test-db-connection

The transient unit disappears when the process exits. Logs stay in the journal.

While you're on this server, check boot time:

systemd-analyze                              # Total boot time
systemd-analyze blame | head -5              # Slowest units
systemd-analyze critical-chain payment-processor.service  # Dependency chain
payment-processor.service +301ms
└─postgresql.service @4.112s +412ms
  └─network-online.target @4.001s
    └─NetworkManager-wait-online.service @1.667s +2.334s

The 2.3-second NetworkManager-wait-online is the real bottleneck. Now you know where to look if boot time becomes a problem.


Part 11: Restart and Verify

sudo systemctl daemon-reload
sudo systemctl restart payment-processor
systemctl status payment-processor                        # Running?
journalctl -u payment-processor -f                        # Watch startup
systemctl show payment-processor -p NRestarts              # Should be 0
systemctl show payment-processor -p MemoryMax,MemoryHigh   # Limits applied?
systemd-analyze security payment-processor                 # Security score

Payments are flowing. The restart loop is gone. The service is hardened.


Flashcard Check — Part 2

Question Cover the answer, then check
What does ProtectSystem=strict do? Mounts filesystem read-only except ReadWritePaths=. Exit code 226/NAMESPACE = sandboxing failure.
MemoryHigh vs MemoryMax? MemoryHigh = soft throttle. MemoryMax = hard kill. Use both for graduated response.
What does Persistent=true do in a timer? Runs the job on next boot if it was missed. Cron can't do this.
How do you create a one-off supervised command? systemd-run --property=MemoryMax=256M ./cmd — creates a transient unit.
mask vs disable? disable removes boot symlink. mask symlinks to /dev/null — blocks starting by any means.

Exercises

Exercise 1: Read a crash loop (2 minutes)

systemctl list-units --failed

Pick a failed unit. Run systemctl status <unit> and journalctl -u <unit> -n 30. Can you identify the exit code and root cause?

What to look for Common exit codes: `203/EXEC` (binary not found), `217/USER` (user doesn't exist), `226/NAMESPACE` (sandboxing failed), `1/FAILURE` (generic app error — check app logs).

Exercise 2: Write a timer (10 minutes)

Replace this crontab with a systemd timer:

*/15 * * * * root /usr/local/bin/check-disk-space.sh >> /var/log/disk-check.log 2>&1

Requirements: Type=oneshot, persistent, 60-second random delay, 128M memory limit.

Solution
# /etc/systemd/system/disk-check.service
[Unit]
Description=Check disk space
[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-disk-space.sh
MemoryMax=128M
# /etc/systemd/system/disk-check.timer
[Unit]
Description=Check disk space every 15 minutes
[Timer]
OnCalendar=*:0/15
Persistent=true
RandomizedDelaySec=60
[Install]
WantedBy=timers.target
sudo systemctl daemon-reload
sudo systemctl enable --now disk-check.timer

Exercise 3: Security audit (15 minutes)

Pick three services and check their security scores:

for svc in sshd nginx postgresql; do
  echo "=== $svc ==="
  systemd-analyze security "$svc" 2>/dev/null | tail -1
done

Which scores worst? Write a drop-in override adding ProtectSystem=strict, NoNewPrivileges=true, and PrivateTmp=true. Restart. Does it still work? If not, what ReadWritePaths= does it need?


Cheat Sheet

Service lifecycle

Command Effect
systemctl start/stop/restart <unit> Control running state
systemctl reload <unit> Send SIGHUP (re-read config, no downtime)
systemctl enable --now <unit> Start now + start on boot
systemctl mask <unit> Prevent starting by any means
systemctl daemon-reload Re-read unit files from disk
systemctl reset-failed <unit> Clear "failed" state

Diagnostics

Command Shows
systemctl status <unit> State, PID, memory, recent logs
systemctl cat <unit> Effective unit file with overrides
systemctl list-units --failed All failed units
journalctl -u <unit> -f Live logs
journalctl -u <unit> -p err --since "1h ago" Recent errors
journalctl -b -1 Previous boot
systemd-analyze security <unit> Security score
systemd-analyze blame Slowest boot units
systemd-cgtop Live cgroup resource monitor

Resource directives

Directive On exceed
MemoryMax= SIGKILL (OOM)
MemoryHigh= Kernel throttles
CPUQuota= Throttled
TasksMax= Fork fails
RuntimeMaxSec= Graceful restart

Security directives

Directive Effect
ProtectSystem=strict Filesystem read-only except ReadWritePaths=
ProtectHome=true /home, /root inaccessible
PrivateTmp=true Isolated /tmp
NoNewPrivileges=true No privilege escalation

Takeaways

  • systemctl status first, journalctl second. The exit code narrows the problem space before you start reading logs.

  • Requires= needs After=. Dependency without ordering means simultaneous startup. Always pair them.

  • MemoryHigh + MemoryMax = graduated response. Soft throttle before hard kill. Never use MemoryMax alone.

  • Timers over cron, always. Persistent missed-run recovery, journal logging, resource limits, random delay. No good reason for new cron jobs.

  • Security hardening is free. ProtectSystem=strict, NoNewPrivileges=true, PrivateTmp=true on every service. Fix the ReadWritePaths= errors that follow.

  • daemon-reload after every unit file change. This will bite you exactly once.