Answer Key: The Service That Won't Start¶
The System¶
A nightly batch data export pipeline running on a dedicated batch server (prod-batch-01). A shell script (export.sh) is executed by a systemd service on a timer (or triggered by cron) to extract transaction and user event data from a database and export it — likely to S3, SFTP, or a data warehouse.
[systemd timer] --> [data-exporter.service]
|
/opt/data-exporter/bin/export.sh
|
[Database] --> [Export Target]
|
Pushes metrics: data_export_rows_total, data_export_last_success_timestamp
[crond] --> /opt/data-exporter/bin/cleanup.sh (separate maintenance job)
Ansible manages deployment of the script and systemd unit file. The system also has a separate cron job running cleanup.sh (probably removing old export files).
What's Broken¶
Root cause: The Ansible playbook deploy-exporter.yml sets the file mode to 0644 (read-write for owner, read-only for others) — no execute permission. When systemd tries to run /opt/data-exporter/bin/export.sh as the ExecStart command, the kernel returns EACCES and systemd reports exit code 126 (permission denied).
The playbook was run at 18:45 on Oct 1 (visible in the Ansible log), overwriting the script with correct content but wrong permissions. The next scheduled run at 02:00 on Oct 2 failed.
Key clue: Exit code 126 in the systemctl status output, combined with ls -la showing -rw-r--r-- (no x bit), and the Ansible task explicitly setting mode: "0644".
The Fix¶
Immediate¶
# Add execute permission
chmod +x /opt/data-exporter/bin/export.sh
# Restart the service
systemctl restart data-exporter.service
# Verify it's running
systemctl status data-exporter.service
Permanent (fix the Ansible playbook)¶
Change the mode in the Ansible task:
- name: Deploy data exporter binary
ansible.builtin.copy:
src: files/export.sh
dest: /opt/data-exporter/bin/export.sh
owner: exporter
group: exporter
mode: "0755" # was 0644 — needs execute bit
notify: restart data-exporter
Verification¶
# Check the service ran successfully
systemctl status data-exporter.service
# Check the custom metric updated
curl -s localhost:9100/metrics | grep data_export_last_success
# Verify file permissions persist after next Ansible run
ansible-playbook ansible/playbooks/deploy-exporter.yml --check --diff
Artifact Decoder¶
| Artifact | What It Revealed | What Was Misleading |
|---|---|---|
| CLI Output | Exit code 126 = permission denied; ls -la confirms no execute bit |
The service name "data-exporter" tells you the purpose, but not why it broke |
| Metrics | Last success was 24h ago — confirms exactly one missed run | Row counts look interesting but are from the previous successful run, not diagnostic |
| IaC Snippet | mode: "0644" is the root cause — Ansible deployed without execute permission |
The notify: restart data-exporter handler works correctly, which is why the service attempted to run |
| Log Lines | Ansible changed at 18:45 confirms the file was redeployed between successful runs |
The cron line for cleanup.sh is a red herring — it is a different script and ran fine |
Skills Demonstrated¶
- Interpreting systemd exit codes (126 vs 127)
- Reading Unix file permissions
- Tracing deployment timelines across log sources
- Identifying Ansible configuration errors
- Distinguishing related-but-irrelevant system activity from root cause