Skip to content

Solution

Triage

  1. Check service status:
    systemctl status data-sync
    
  2. Read the journal for the service:
    journalctl -u data-sync --no-pager -n 100
    
  3. Examine the unit file:
    systemctl cat data-sync
    
  4. Check what environment the service runs in:
    systemctl show data-sync -p Environment,User,Group,WorkingDirectory
    

Root Cause

The service runs as User=datasync in the systemd unit file. The application requires a configuration file at /etc/data-sync/config.yaml and a database connection string from the environment variable DATABASE_URL. In the development environment, these are present in the developer's shell environment and home directory. In production:

  1. The EnvironmentFile=/etc/data-sync/env is specified in the unit file, but the file does not exist on this new server.
  2. The application starts, immediately fails to read the missing config, prints an error to stderr, and exits with code 1.
  3. systemd's Restart=always restarts it immediately. After 5 failures in 10 seconds, StartLimitBurst=5 is hit and systemd marks the service as failed.

When run manually as root, the engineer's shell has DATABASE_URL exported from .bashrc, masking the problem.

Fix

  1. Create the environment file:

    cat > /etc/data-sync/env <<EOF
    DATABASE_URL=postgresql://datasync:password@db.internal:5432/sync_prod
    LOG_LEVEL=info
    EOF
    chmod 600 /etc/data-sync/env
    chown datasync:datasync /etc/data-sync/env
    

  2. Ensure the config file exists:

    cp /opt/data-sync/config.yaml.example /etc/data-sync/config.yaml
    # Edit with production values
    chown datasync:datasync /etc/data-sync/config.yaml
    

  3. Reset the failure counter and start:

    systemctl reset-failed data-sync
    systemctl start data-sync
    systemctl status data-sync
    

  4. Add restart delay to prevent rapid flapping:

    [Service]
    RestartSec=10s
    

Rollback / Safety

  • If the service still fails after fixing the config, check journalctl -u data-sync -f in real time.
  • Verify the datasync user can read all required files: sudo -u datasync cat /etc/data-sync/config.yaml.
  • Test the exact systemd environment: systemd-run --uid=datasync --gid=datasync /opt/data-sync/bin/data-sync.

Common Traps

  • Forgetting systemctl reset-failed. After hitting the start limit, systemctl restart will refuse to start the service until the failure counter is reset.
  • Assuming the shell environment equals the systemd environment. systemd services run in a clean environment. There is no .bashrc, no PATH modifications, no exported variables.
  • Using Type=simple for a forking daemon. If the process forks and the parent exits, systemd thinks it crashed. Use Type=forking or Type=notify as appropriate.
  • Not checking permissions for the service user. Files readable by root are not necessarily readable by the service user.
  • Editing the unit file in /lib/systemd/system/ instead of using an override. Use systemctl edit data-sync to create a drop-in override that survives package updates.