Solution¶
Triage¶
- Check service status:
- Read the journal for the service:
- Examine the unit file:
- Check what environment the service runs in:
Root Cause¶
The service runs as User=datasync in the systemd unit file. The application requires a configuration file at /etc/data-sync/config.yaml and a database connection string from the environment variable DATABASE_URL. In the development environment, these are present in the developer's shell environment and home directory. In production:
- The
EnvironmentFile=/etc/data-sync/envis specified in the unit file, but the file does not exist on this new server. - The application starts, immediately fails to read the missing config, prints an error to stderr, and exits with code 1.
- systemd's
Restart=alwaysrestarts it immediately. After 5 failures in 10 seconds,StartLimitBurst=5is hit and systemd marks the service asfailed.
When run manually as root, the engineer's shell has DATABASE_URL exported from .bashrc, masking the problem.
Fix¶
-
Create the environment file:
-
Ensure the config file exists:
-
Reset the failure counter and start:
-
Add restart delay to prevent rapid flapping:
Rollback / Safety¶
- If the service still fails after fixing the config, check
journalctl -u data-sync -fin real time. - Verify the
datasyncuser can read all required files:sudo -u datasync cat /etc/data-sync/config.yaml. - Test the exact systemd environment:
systemd-run --uid=datasync --gid=datasync /opt/data-sync/bin/data-sync.
Common Traps¶
- Forgetting
systemctl reset-failed. After hitting the start limit,systemctl restartwill refuse to start the service until the failure counter is reset. - Assuming the shell environment equals the systemd environment. systemd services run in a clean environment. There is no
.bashrc, noPATHmodifications, no exported variables. - Using
Type=simplefor a forking daemon. If the process forks and the parent exits, systemd thinks it crashed. UseType=forkingorType=notifyas appropriate. - Not checking permissions for the service user. Files readable by root are not necessarily readable by the service user.
- Editing the unit file in
/lib/systemd/system/instead of using an override. Usesystemctl edit data-syncto create a drop-in override that survives package updates.