GitHub Actions - Street-Level Ops¶
What experienced GitHub Actions operators know that tutorials don't teach.
Quick Diagnosis Commands¶
# List recent workflow runs for a repo
gh run list --repo owner/repo --limit 20
# View a specific run's logs
gh run view <run-id> --log
# Watch a run in progress
gh run watch <run-id>
# Re-run a failed job (just the failed jobs)
gh run rerun <run-id> --failed
# Re-run entire workflow
gh run rerun <run-id>
# List all workflows
gh workflow list --repo owner/repo
# Trigger a workflow manually (workflow_dispatch)
gh workflow run deploy.yml --repo owner/repo -f environment=staging
# View workflow run details (steps, timing)
gh run view <run-id> --repo owner/repo
# Download artifacts from a run
gh run download <run-id> --repo owner/repo --dir ./artifacts
# Check self-hosted runner status
gh api repos/owner/repo/actions/runners | jq '.runners[] | {name, status, busy}'
# List queued/in-progress runs
gh run list --repo owner/repo --status in_progress
gh run list --repo owner/repo --status queued
# Cancel a stuck run
gh run cancel <run-id> --repo owner/repo
Common Scenarios¶
Scenario 1: Workflow Stuck in Queued State¶
A workflow has been queued for 10+ minutes without starting.
Diagnosis:
# Check if runners are available and online
gh api repos/owner/repo/actions/runners | jq '.runners[] | {name, status, busy, labels}'
# Check org-level runners too (if applicable)
gh api orgs/myorg/actions/runners | jq '.runners[] | {name, status, busy}'
# Look at the job's runner requirements in the YAML
# The 'runs-on' label must match an available runner
# For GitHub-hosted: check GitHub status page
# https://www.githubstatus.com/
Common causes and fixes:
1. No matching runner label:
- Job requires 'self-hosted, linux, gpu' but only 'self-hosted, linux' is registered
- Fix: update runner labels or change the 'runs-on' value in the workflow
2. All self-hosted runners are busy or offline:
- Scale up runner pool or wait
- Check runner machine is up: ssh to runner host, check runner service
sudo systemctl status actions.runner.*.service
sudo journalctl -u actions.runner.*.service -n 50
3. Concurrency group blocking:
- Another run holds the concurrency lock
- gh run list --repo owner/repo --status in_progress
- Cancel the blocking run or wait
4. GitHub-hosted runner availability (Actions outage):
- Check https://www.githubstatus.com/
Under the hood: GitHub-hosted runners are ephemeral VMs -- each job gets a fresh VM that is destroyed after the job completes. Self-hosted runners persist by default, which means file system state, Docker images, and tool versions accumulate between jobs. This is both a feature (faster warm cache) and a trap (dependency contamination between unrelated workflows).
Scenario 2: Secret Not Available in Workflow¶
A step fails because an env var is empty; you set the secret in the repo settings.
Diagnosis:
# Confirm the secret name matches exactly (case-sensitive)
gh secret list --repo owner/repo
# For org secrets, check if repo has access
gh api orgs/myorg/actions/secrets/MY_SECRET/repositories \
| jq '.repositories[].name'
# Add debug step to print which vars are set (never print values)
- name: Debug env
run: env | grep -i "INPUT_\|RUNNER_\|GITHUB_" | sort
Fix:
# Wrong — secrets need explicit mapping into env
steps:
- run: deploy.sh
env:
API_KEY: ${{ secrets.API_KEY }} # correct mapping
# Environment secrets require the job to target the environment
jobs:
deploy:
environment: production # gate to access environment secrets
steps:
- run: deploy.sh
env:
PROD_KEY: ${{ secrets.PROD_KEY }}
OIDC for cloud auth (preferred over long-lived secrets):
permissions:
id-token: write
contents: read
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-role
aws-region: us-east-1
- run: aws s3 ls
Scenario 3: Cache Miss Every Run¶
You set up actions/cache but cache hit rate is 0%.
Diagnosis:
# Check cache key — if it includes content that changes every run, you'll always miss
- uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
# restore-keys provides fallback on partial match
restore-keys: |
${{ runner.os }}-npm-
Common causes:
1. Key includes ${{ github.sha }} — changes every commit, always a miss
Fix: key on lockfile hash, not commit SHA
2. Different runner OS (ubuntu-22 vs ubuntu-24) — cache is OS-keyed
Fix: pin runner version: ubuntu-22.04 not ubuntu-latest
3. Paths don't match — caching ~/.npm but restoring to ~/different/path
Fix: verify 'path' matches where the tool actually writes
4. Cache eviction — GitHub evicts caches not accessed in 7 days, or >10GB total
Fix: nothing to do, cache will rebuild on next hit
5. Branch-scoped caches — caches created on feature branches don't restore on main
Fix: use restore-keys to fall back to main's cache
Default trap: GitHub Actions caches are scoped to the branch where they were created. A cache saved on a feature branch is not available to
main. Butmain's cache IS available to feature branches viarestore-keysfallback. This means the first CI run onmainafter a long period always misses -- build a cache onmainvia a scheduled workflow to keep it warm.
Scenario 4: Matrix Build Partial Failure¶
A matrix strategy has 10 jobs; 2 fail, 8 pass. You want to rerun only the failures.
# Rerun only failed jobs
gh run rerun <run-id> --failed
# If you need to debug one matrix leg interactively, use tmate
- name: Debug via tmate
if: failure()
uses: mxschmitt/action-tmate@v3
with:
limit-access-to-actor: true
# To continue other matrix jobs when one fails (fail-fast: false)
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
node: [18, 20, 22]
Key Patterns¶
Workflow Trigger Best Practices¶
on:
push:
branches: [main]
paths:
- 'src/**'
- 'package*.json' # skip CI when only docs change
pull_request:
branches: [main]
types: [opened, synchronize, reopened]
workflow_dispatch: # manual trigger with optional inputs
inputs:
environment:
description: 'Target environment'
required: true
default: 'staging'
type: choice
options: [staging, production]
schedule:
- cron: '0 6 * * 1' # Monday 6 AM UTC for weekly jobs
Concurrency Control¶
# Cancel in-progress runs for the same PR/branch (safe for CI, dangerous for deploy)
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
# For deployments: queue instead of cancel
concurrency:
group: deploy-${{ inputs.environment }}
cancel-in-progress: false # wait for current deploy to finish
Reusable Workflows¶
# Caller workflow
jobs:
call-deploy:
uses: myorg/workflows/.github/workflows/deploy.yml@main
with:
environment: production
version: ${{ needs.build.outputs.version }}
secrets: inherit # or explicitly: secrets: { TOKEN: ${{ secrets.TOKEN }} }
# Reusable workflow definition
on:
workflow_call:
inputs:
environment:
required: true
type: string
secrets:
TOKEN:
required: true
Artifact Upload/Download¶
# Upload build artifacts
- uses: actions/upload-artifact@v4
with:
name: dist-${{ github.sha }}
path: dist/
retention-days: 7
# Download in a subsequent job
jobs:
build:
outputs:
artifact-name: dist-${{ github.sha }}
steps:
- uses: actions/upload-artifact@v4
with:
name: dist-${{ github.sha }}
path: dist/
deploy:
needs: build
steps:
- uses: actions/download-artifact@v4
with:
name: dist-${{ github.sha }}
path: ./dist
Environment Protection Rules¶
# Workflow targets environment — triggers required reviewers, wait timers
jobs:
deploy-prod:
environment:
name: production
url: https://myapp.example.com
steps:
- run: ./deploy.sh production
Configure in GitHub: Settings → Environments → production → Required reviewers, wait timer (max 30 days), deployment branch policy.
Self-Hosted Runner Registration¶
# Register a new runner (runner machine)
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.317.0/actions-runner-linux-x64-2.317.0.tar.gz
tar xzf ./actions-runner-linux-x64.tar.gz
./config.sh --url https://github.com/owner/repo --token <TOKEN>
# Install as service
sudo ./svc.sh install
sudo ./svc.sh start
sudo systemctl status actions.runner.*.service
# Runner logs
sudo journalctl -u actions.runner.*.service -f
# Remove a runner
./config.sh remove --token <TOKEN>
Rate Limits and API Throttling¶
# Check your current rate limit status
gh api rate_limit | jq '.rate'
# GitHub Actions API limits:
# - 1000 API requests per hour per repo
# - 100 concurrent jobs per org (GitHub-hosted)
# - 256 jobs per workflow
# - 6 hours max job runtime (GitHub-hosted)
# - 35 day artifact retention (default 90 days)
# If hitting rate limits in workflows:
# - Cache aggressively (hashFiles on lockfiles)
# - Use github.token for API calls (higher limits than PAT)
# - Batch API calls where possible
Debugging with act (local runner)¶
# Install act
brew install act # macOS
# or: curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash
# List available jobs
act -l
# Run a specific job locally
act push -j build
# With secrets file
act push -j build --secret-file .secrets
# Use a specific runner image (default is micro, use medium for more tools)
act push --platform ubuntu-latest=ghcr.io/catthehacker/ubuntu:act-latest