systemctl & journalctl Footguns¶
Mistakes that cause services to silently fail, logs to disappear, and units that refuse to start with cryptic errors.
1. Forgetting daemon-reload After Unit File Changes¶
You edit /etc/systemd/system/myapp.service. You run systemctl restart
myapp. It restarts with the old configuration. You check the file -- your
changes are there. You restart again. Same result.
systemd caches unit file definitions in memory. Until you run
daemon-reload, it ignores changes on disk.
Why people do it: Every other service manager picks up config changes on restart. systemd is the exception.
Fix: After any unit file change:
Make it a single muscle-memory command. In deployment scripts, always
include daemon-reload before any restart.
2. Confusing systemctl enable with systemctl start¶
You deploy a new service. You run systemctl enable myapp. You check
status: "inactive (dead)." You assume it will start on next boot. Meanwhile,
anything that depends on it right now is broken.
enable creates boot-time symlinks. start runs the service now. They
do completely different things.
Fix: Use systemctl enable --now myapp to do both. Or explicitly:
Always verify with systemctl is-active myapp.service.
3. Mask Preventing Start Without Clear Error¶
During an incident, you ran systemctl mask myapp to prevent it from
starting. The incident resolved. Six months later, someone tries to start
myapp. The error says "Unit myapp.service is masked." Nobody remembers
masking it. Nobody knows what masking means.
mask creates a symlink from the unit to /dev/null. It survives reboots,
enable commands, and package reinstalls. It is the nuclear option.
Why people do it: mask is the only way to prevent a unit from
starting even as a dependency. It is the right tool sometimes. The problem
is forgetting it was used.
Fix:
# Find all masked units
systemctl list-unit-files --state=masked
# Unmask
systemctl unmask myapp.service
Prefer disable over mask unless you specifically need to block
dependency-triggered starts. If you mask, document it in your runbook.
4. Type=forking for a Non-Forking Process¶
Your application stays in the foreground. You set Type=forking in the
unit file. systemd waits for the parent process to exit (because that is
what forking daemons do). The parent never exits because the app runs in
foreground. After TimeoutStartSec, systemd kills it and marks the unit
as failed.
Or the reverse: the app forks and Type=simple is set. The parent exits,
systemd thinks the service died, marks it failed. The child process is
running fine but orphaned from systemd's perspective.
Fix: Match Type= to the actual behavior:
| Process behavior | Correct Type |
|---|---|
| Stays in foreground | simple or exec |
| Forks into background | forking (set PIDFile= too) |
| Signals readiness via sd_notify | notify |
| Runs and exits (setup script) | oneshot |
Check the application docs or run it manually to see whether it forks.
If a forking daemon has a --no-daemon flag, use that with Type=exec.
5. Environment Variables Not Inherited¶
Your service works when you run it manually from a shell. Under systemd,
it fails with "variable not found" or "connection refused" (because
DATABASE_URL is unset).
systemd provides a minimal environment. It does not inherit your shell's
env vars, .bashrc, .profile, or anything in /etc/environment by
default.
Fix: Explicitly declare environment:
[Service]
# Inline variables
Environment="DATABASE_URL=postgres://localhost/mydb"
Environment="LOG_LEVEL=info"
# Or load from a file (one VAR=value per line)
EnvironmentFile=/etc/myapp/env
The EnvironmentFile approach is better for secrets -- the file can be
mode 0600 and owned by root, while the unit file is world-readable.
Do not use shell wrapper scripts to set environment. Use Environment=
or EnvironmentFile= directly.
6. Restart=always Without Rate Limiting Fills Logs¶
Your service has a config error. It starts, crashes immediately, and
systemd restarts it. With Restart=always and RestartSec=0, this
happens hundreds of times per second. Each crash writes log lines. The
journal grows at megabytes per minute. /var fills up. Other services
that need to write to /var start failing.
Why people do it: Restart=always is the standard resilience setting.
The problem is missing rate limiting.
Fix: Always pair Restart= with RestartSec= and verify the start
limit defaults have not been overridden:
This allows 5 restarts in 5 minutes, with 5 seconds between each. After
5 failures, the unit enters failed state and stops retrying.
The defaults (StartLimitIntervalSec=10, StartLimitBurst=5) provide
minimal protection, but RestartSec=0 still allows 5 rapid crashes
in 10 seconds -- enough to write thousands of log lines.
7. WantedBy=multi-user.target Missing from [Install]¶
You create a service unit. You run systemctl enable myapp. It says
"Created symlink..." but at boot, myapp does not start.
Without WantedBy= in the [Install] section, systemctl enable has
nowhere to create the symlink. Some unit files ship without [Install]
entirely -- these are "static" units meant to be started only as
dependencies of other units.
Fix: Add the [Install] section:
Verify after enabling:
If it says "static", the unit has no [Install] section.
8. journalctl --vacuum Deleting Valuable Logs¶
You run journalctl --vacuum-time=7d to free disk space. It deletes all
journal entries older than 7 days. Including the logs from last month's
incident that you were still investigating.
Vacuum operations are immediate and irreversible. There is no "are you sure?" prompt. There is no trash or undo.
Fix: Before vacuuming, export logs you might need:
# Export specific unit logs before vacuuming
journalctl -u myapp --since "2025-02-01" --until "2025-03-01" \
-o export > myapp-feb-2025.export
# Then vacuum
journalctl --vacuum-time=7d
Better yet, configure retention limits in /etc/systemd/journald.conf so
journald manages rotation automatically:
This way you never need manual vacuum operations.
9. Timer Persistent=true Not Set¶
You create a timer that runs a backup at 2 AM daily. The server is off for maintenance from 1 AM to 5 AM. The 2 AM backup does not run. Nobody notices until the next incident when the backup is needed and it is stale.
Without Persistent=true, if the system is off (or the timer is inactive)
when the timer should fire, the run is silently lost.
Fix:
With Persistent=true, systemd checks on boot whether any runs were
missed, and triggers them immediately.
This should be the default for every timer that does meaningful work.
10. Drop-in Override Precedence Confusion¶
You create /etc/systemd/system/nginx.service.d/override.conf to change
ExecStart. But you also have an old 10-custom.conf in the same
directory. The files are applied in lexicographic order: 10-custom.conf
first, then override.conf. If both set ExecStart=, the last one wins
-- but only if both clear the value first.
If 10-custom.conf sets ExecStart=/usr/sbin/nginx -g 'daemon off;'
and override.conf sets ExecStart=/usr/sbin/nginx -c /custom.conf,
you get an error because there are two ExecStart= values.
Fix: Use numbered prefixes and always clear before setting:
# /etc/systemd/system/nginx.service.d/10-execstart.conf
[Service]
ExecStart=
ExecStart=/usr/sbin/nginx -g 'daemon off;' -c /etc/nginx/custom.conf
To see the effective merged configuration:
To find all overrides across the system:
11. ExecStart= (Empty Value) to Clear: Missed or Misunderstood¶
In a drop-in override, you want to change ExecStart. You write:
systemd rejects this because the original unit already has an ExecStart
and non-oneshot services allow only one. You need to clear it first:
The empty ExecStart= resets the list. The second line sets the new value.
This pattern applies to several list-type directives: ExecStart=,
ExecStartPre=, ExecStartPost=, ExecStop=, ExecStopPost=.
Gotcha within the gotcha: Environment= is additive and does NOT
need clearing. If you clear it with Environment=, you wipe all
environment variables -- including ones set by other drop-ins.
12. Not Using ProtectSystem/PrivateTmp for Security¶
Your custom service runs as a non-root user but has no sandboxing directives. A vulnerability in the application gives an attacker a shell as the service user. Without sandboxing, the attacker can:
- Read
/etc/shadow(if the service user can) - Write to
/tmp(shared with all processes, symlink attacks) - Read any user's home directory
- Load kernel modules (if they escalate to root)
- Access the full device tree
Adding four lines to the unit file blocks most of this:
Why people skip it: "It works without it." Sandboxing is invisible until it prevents an attack or breaks a service that accesses unexpected paths.
Fix: Add sandboxing to every custom service. Use systemd-analyze
security myapp.service to check the exposure score and find which
directives to add. Aim for a score below 5.0.
A score of 9.6 (default for a bare unit) means the service has nearly unrestricted system access. Each directive you add lowers the score.