Skip to content

Answer Key: The Service That Won't Start

The System

A nightly batch data export pipeline running on a dedicated batch server (prod-batch-01). A shell script (export.sh) is executed by a systemd service on a timer (or triggered by cron) to extract transaction and user event data from a database and export it — likely to S3, SFTP, or a data warehouse.

[systemd timer] --> [data-exporter.service]
                         |
                    /opt/data-exporter/bin/export.sh
                         |
                    [Database] --> [Export Target]
                         |
                    Pushes metrics: data_export_rows_total, data_export_last_success_timestamp

[crond] --> /opt/data-exporter/bin/cleanup.sh (separate maintenance job)

Ansible manages deployment of the script and systemd unit file. The system also has a separate cron job running cleanup.sh (probably removing old export files).

What's Broken

Root cause: The Ansible playbook deploy-exporter.yml sets the file mode to 0644 (read-write for owner, read-only for others) — no execute permission. When systemd tries to run /opt/data-exporter/bin/export.sh as the ExecStart command, the kernel returns EACCES and systemd reports exit code 126 (permission denied).

The playbook was run at 18:45 on Oct 1 (visible in the Ansible log), overwriting the script with correct content but wrong permissions. The next scheduled run at 02:00 on Oct 2 failed.

Key clue: Exit code 126 in the systemctl status output, combined with ls -la showing -rw-r--r-- (no x bit), and the Ansible task explicitly setting mode: "0644".

The Fix

Immediate

# Add execute permission
chmod +x /opt/data-exporter/bin/export.sh

# Restart the service
systemctl restart data-exporter.service

# Verify it's running
systemctl status data-exporter.service

Permanent (fix the Ansible playbook)

Change the mode in the Ansible task:

- name: Deploy data exporter binary
  ansible.builtin.copy:
    src: files/export.sh
    dest: /opt/data-exporter/bin/export.sh
    owner: exporter
    group: exporter
    mode: "0755"    # was 0644 — needs execute bit
  notify: restart data-exporter

Verification

# Check the service ran successfully
systemctl status data-exporter.service

# Check the custom metric updated
curl -s localhost:9100/metrics | grep data_export_last_success

# Verify file permissions persist after next Ansible run
ansible-playbook ansible/playbooks/deploy-exporter.yml --check --diff

Artifact Decoder

Artifact What It Revealed What Was Misleading
CLI Output Exit code 126 = permission denied; ls -la confirms no execute bit The service name "data-exporter" tells you the purpose, but not why it broke
Metrics Last success was 24h ago — confirms exactly one missed run Row counts look interesting but are from the previous successful run, not diagnostic
IaC Snippet mode: "0644" is the root cause — Ansible deployed without execute permission The notify: restart data-exporter handler works correctly, which is why the service attempted to run
Log Lines Ansible changed at 18:45 confirms the file was redeployed between successful runs The cron line for cleanup.sh is a red herring — it is a different script and ran fine

Skills Demonstrated

  • Interpreting systemd exit codes (126 vs 127)
  • Reading Unix file permissions
  • Tracing deployment timelines across log sources
  • Identifying Ansible configuration errors
  • Distinguishing related-but-irrelevant system activity from root cause

Prerequisite Topic Packs