Skip to content

GitHub Actions: CI/CD That Lives in Your Repo

  • lesson
  • github-actions
  • ci/cd-pipelines
  • docker
  • oidc
  • supply-chain-security
  • caching
  • secrets-management ---# GitHub Actions — CI/CD That Lives in Your Repo

Topics: GitHub Actions, CI/CD pipelines, Docker, OIDC, supply chain security, caching, secrets management Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (Git basics help, but everything is explained)


The Mission

Your team just moved a Python API from a manually-deployed VM to containers. There's no CI. Developers push to main, SSH into the server, run docker build, and pray. Last Tuesday someone deployed without running tests and broke authentication for 4,000 users.

Your job: build a CI/CD pipeline from scratch in GitHub Actions. By the end of this lesson you'll have a workflow that lints, tests, builds a Docker image, and deploys to staging and production — with no static credentials anywhere, matrix testing across Python versions, and security hardening that would survive a hostile fork.

We're going to build it piece by piece. Each section adds one capability to the pipeline, so you see why each feature exists before you use it.


Part 1: Anatomy of a Workflow File

Everything in GitHub Actions starts with a YAML file in .github/workflows/. That's it. No external server, no Jenkins install, no webhook configuration. The CI/CD lives in the same repo as the code.

# .github/workflows/ci.yml
name: CI Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: echo "Hello from CI"

That's a working pipeline. Push it and GitHub runs it. Let's break down every piece:

Line What it does
name: CI Pipeline Display name in the Actions tab — cosmetic but helpful
on: push Trigger: run this workflow when code is pushed
branches: [main] Only for pushes to main (skip feature branches here)
on: pull_request Also run on PRs targeting main — your gate before merge
jobs: Container for all jobs; each job gets its own fresh VM
runs-on: ubuntu-latest The runner — a GitHub-hosted 2-core, 7GB RAM Azure VM
uses: actions/checkout@v4 Clone your repo into the runner (most-used action in existence)
run: Execute a shell command

Trivia: The first version of GitHub Actions (2018) used HCL, not YAML. GitHub switched to YAML for the public launch in 2019, citing broader developer familiarity. The decision was controversial — many developers found YAML verbose for complex logic — but it won because nearly everyone already knew YAML from Docker Compose, Kubernetes manifests, and Ansible playbooks.

Under the Hood: GitHub-hosted runners are ephemeral Azure VMs. Each job gets a fresh VM that's destroyed after the job completes. Nothing persists between jobs — no files, no Docker images, no installed packages. This is a feature: it prevents one workflow from contaminating another. It's also why caching matters (we'll get there).

The trigger zoo

GitHub Actions has more trigger types than most people realize:

on:
  push:                          # code pushed
    branches: [main]
    paths: ['src/**', 'tests/**']  # only when these files change
  pull_request:                  # PR opened, updated, or reopened
    types: [opened, synchronize, reopened]
  schedule:
    - cron: '0 6 * * 1'         # Monday 6am UTC (cron syntax)
  workflow_dispatch:             # manual "Run workflow" button
    inputs:
      environment:
        type: choice
        options: [staging, production]
  workflow_call:                 # called by another workflow (reusable)
  release:
    types: [published]           # when a GitHub Release is created

The paths filter is an underrated time-saver. If you only changed a README, why rebuild a Docker image? Skip CI on doc-only changes and save minutes (which are literally money on private repos).


Part 2: Build the Real Pipeline — Lint, Test, Build

Time to replace that echo with something useful. Here's a pipeline for a Python API:

name: CI Pipeline

on:
  push:
    branches: [main]
    paths:
      - 'app/**'
      - 'tests/**'
      - 'requirements*.txt'
      - 'Dockerfile'
      - '.github/workflows/ci.yml'
  pull_request:
    branches: [main]

permissions:
  contents: read          # principle of least privilege — more on this later

jobs:
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'              # built-in pip cache — no actions/cache needed

      - name: Install linter
        run: pip install ruff

      - name: Lint
        run: ruff check app/ tests/

  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    needs: lint                     # only run if lint passes
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt -r requirements-test.txt

      - name: Run tests with coverage
        run: pytest --cov=app --cov-fail-under=90 -v

      - name: Upload coverage report
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: htmlcov/
          retention-days: 7

A few things worth noticing:

timeout-minutes — always set this. The default is 360 minutes (6 hours). If your test suite hangs, you don't want to find out when you get the bill.

needs: lint — creates a dependency chain. test won't start until lint passes. If lint fails, test is skipped. This saves minutes and makes failures faster to diagnose.

cache: 'pip' — the setup-python action has built-in caching. One line replaces what used to be 10 lines of actions/cache configuration. Most setup-* actions now have this feature (Node, Python, Go, Java).

upload-artifact — saves the coverage report as a downloadable file attached to the workflow run. Artifacts are how you pass data between jobs (each job runs on a separate VM, remember).

Gotcha: Artifacts are not caches. Caches persist across workflow runs (for the same branch) and are used to speed up dependency installation. Artifacts are output from a specific run — test reports, built binaries, Docker images — and expire after a configurable retention period. Mixing them up leads to either missing data or bloated cache storage.

Flashcard check

Question Answer
What is the default timeout for a GitHub Actions job? 360 minutes (6 hours)
What does needs: lint do? Creates a dependency — the job only runs if lint succeeds
Where do workflow files live? .github/workflows/*.yml in the repository
What's the difference between a cache and an artifact? Cache persists across runs (speeds up installs). Artifact is output from one run (reports, binaries).

Part 3: Matrix Builds — Test Everything at Once

Your API needs to work on Python 3.10, 3.11, and 3.12. You could copy-paste the test job three times, or you could use a matrix:

  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    needs: lint
    strategy:
      fail-fast: false              # don't cancel other versions if one fails
      matrix:
        python-version: ['3.10', '3.11', '3.12']
        include:
          - python-version: '3.12'
            coverage: true          # only measure coverage on latest version

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'

      - name: Install dependencies
        run: pip install -r requirements.txt -r requirements-test.txt

      - name: Run tests
        run: pytest -v

      - name: Run tests with coverage
        if: matrix.coverage
        run: pytest --cov=app --cov-fail-under=90 --cov-report=xml

Three lines of YAML create three parallel jobs. Add an os dimension and you're testing across operating systems too:

    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest]
        python-version: ['3.11', '3.12']
        exclude:
          - os: macos-latest
            python-version: '3.11'  # skip one combination to save minutes

That's 3 combinations (2x2 minus 1 exclusion), each running on its own VM in parallel.

Trivia: A single matrix definition with 5 OS options, 4 language versions, and 3 dependency variants generates 60 parallel jobs. Before matrix builds, this required hundreds of lines of copy-pasted YAML. GitHub allows up to 256 jobs per workflow run.

Gotcha: fail-fast: true is the default, and it's a trap for matrix builds. One failing Python version cancels the other two, hiding failures. You fix the one you saw, re-run, discover another failure, fix it, re-run again — three iterations instead of one. Set fail-fast: false for test matrices. Use fail-fast: true only when all legs are identical (parallel shards of the same suite).


Part 4: Expression Syntax and Contexts

GitHub Actions has its own expression language inside ${{ }}. It's not shell — it's evaluated by the GitHub runner before the shell ever sees it.

# Contexts — where data comes from
${{ github.sha }}                      # full commit SHA
${{ github.ref_name }}                 # branch name (e.g., "main")
${{ github.event.pull_request.number }} # PR number
${{ secrets.MY_SECRET }}               # repository or org secret
${{ vars.MY_VARIABLE }}                # non-secret config variable
${{ needs.build.outputs.version }}     # output from another job
${{ runner.os }}                       # "Linux", "macOS", or "Windows"
${{ hashFiles('**/requirements.txt') }} # SHA-256 of file contents (for cache keys)

# Conditionals
${{ github.ref == 'refs/heads/main' }}  # true on main branch
${{ contains(github.event.head_commit.message, '[skip ci]') }}

# Status check functions
${{ success() }}    # default — previous steps all passed
${{ failure() }}    # at least one previous step failed
${{ always() }}     # run regardless (cleanup steps)
${{ cancelled() }}  # workflow was cancelled

Mental Model: Think of ${{ }} expressions as a template engine, not a programming language. They're evaluated before the shell runs. This means ${{ secrets.TOKEN }} gets replaced with the literal secret value in the command string. That's why you should pass secrets as environment variables (env: TOKEN: ${{ secrets.TOKEN }}) instead of inlining them in run: commands — it avoids shell expansion surprises and keeps secrets out of process listings.


Part 5: Build and Push a Docker Image

Now we need to package the app. Here's the Docker build-and-push job:

  build-image:
    runs-on: ubuntu-latest
    needs: test
    if: github.ref == 'refs/heads/main'   # only build images on main
    permissions:
      contents: read
      packages: write                     # needed to push to GHCR
    outputs:
      image-tag: ${{ steps.meta.outputs.version }}

    steps:
      - uses: actions/checkout@v4

      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}   # auto-provided, no setup needed

      - uses: docker/metadata-action@v5
        id: meta
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=sha,prefix=              # abc1234
            type=ref,event=branch         # main
            type=semver,pattern={{version}}  # 1.2.3 (from git tags)

      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha            # GitHub Actions cache for Docker layers
          cache-to: type=gha,mode=max

Let's unpack the interesting bits:

secrets.GITHUB_TOKEN — this is automatically generated for every workflow run. No setup required. It's scoped to the repository and expires when the job ends. The packages: write permission lets it push to GitHub Container Registry (GHCR).

docker/metadata-action — generates smart tags. A push to main gets tagged with the short SHA and main. A release tag like v1.2.3 also gets 1.2.3. No more hand-crafting tag logic.

cache-from: type=gha — uses GitHub Actions cache backend for Docker layer caching. This means your second build only rebuilds the layers that changed. A full build that takes 4 minutes might take 30 seconds on the next run if only application code changed and dependencies are cached.

Under the Hood: Docker layer caching with type=gha stores layer blobs in the same cache infrastructure as actions/cache. The mode=max option caches all layers (not just the final image layers), which means even intermediate build stages get cached. This is especially valuable for multi-stage Dockerfiles where the dependency-install stage rarely changes.


Part 6: The OIDC Trick — No Static Credentials, Ever

This is the single most important security improvement you can make to any CI/CD pipeline. Instead of storing AWS access keys (or GCP service account keys) as repository secrets, you use OpenID Connect (OIDC) to let GitHub Actions assume a cloud IAM role directly.

No keys to rotate. No keys to leak. No keys at all.

  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-image
    environment: staging                  # gates + environment-specific secrets
    permissions:
      id-token: write                     # THIS is what enables OIDC
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-staging
          aws-region: us-east-1
          # That's it. No access key. No secret key. Just the role ARN.

      - name: Deploy to ECS
        run: |
          aws ecs update-service \
            --cluster api-staging \
            --service api \
            --force-new-deployment

How OIDC works (the 30-second version)

1. Your workflow requests a JWT from GitHub's OIDC provider
2. The JWT contains claims: repo name, branch, workflow, actor
3. AWS (or GCP/Azure) validates the JWT against GitHub's public keys
4. If the claims match the role's trust policy, AWS issues temp credentials
5. Credentials expire when the job ends — usually 1 hour

On the AWS side, you create an IAM role with a trust policy like this:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
      },
      "StringLike": {
        "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
      }
    }
  }]
}

The sub condition is your security boundary. It restricts which repo, branch, and environment can assume the role. A PR from a fork can't assume your production role because its sub claim won't match.

Remember: OIDC = "prove who you are without a shared secret." The same pattern works for GCP (google-github-actions/auth), Azure (azure/login), and HashiCorp Vault. If your pipeline still has AWS_ACCESS_KEY_ID in repository secrets, replacing it with OIDC is the highest-leverage security improvement you can make today.

Interview Bridge: "How do you authenticate CI/CD pipelines to cloud providers without static credentials?" OIDC federation is the answer. It comes up in almost every DevOps interview that touches cloud security.

Flashcard check

Question Answer
What permission does a workflow need to use OIDC? id-token: write
What does OIDC replace in CI/CD? Static cloud credentials (access keys, service account keys)
Where is the security boundary for OIDC defined? In the cloud IAM role's trust policy (sub condition)
What happens to OIDC-issued credentials when the job ends? They expire (typically within 1 hour)

Part 7: Reusable Workflows vs. Composite Actions

As your org grows, you'll want to share pipeline logic across repos. GitHub gives you two mechanisms, and they solve different problems.

Reusable workflows — share entire jobs

# .github/workflows/reusable-deploy.yml (in a central repo)
name: Deploy
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
      image-tag:
        required: true
        type: string
    secrets:
      DEPLOY_TOKEN:
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to ${{ inputs.environment }}
        env:
          DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}
        run: |
          helm upgrade api ./chart \
            --set image.tag=${{ inputs.image-tag }} \
            -f values-${{ inputs.environment }}.yaml
# Caller workflow (in any repo)
jobs:
  deploy-staging:
    uses: your-org/shared-workflows/.github/workflows/reusable-deploy.yml@main
    with:
      environment: staging
      image-tag: ${{ needs.build.outputs.image-tag }}
    secrets:
      DEPLOY_TOKEN: ${{ secrets.STAGING_TOKEN }}

Composite actions — share reusable steps

# .github/actions/setup-python-app/action.yml (in your repo or a shared repo)
name: 'Setup Python App'
description: 'Install Python, dependencies, and lint tools'
inputs:
  python-version:
    required: false
    default: '3.12'
outputs:
  coverage:
    description: 'Test coverage percentage'
    value: ${{ steps.test.outputs.coverage }}
runs:
  using: 'composite'
  steps:
    - uses: actions/setup-python@v5
      with:
        python-version: ${{ inputs.python-version }}
        cache: 'pip'
    - run: pip install -r requirements.txt -r requirements-test.txt
      shell: bash
    - id: test
      run: |
        COV=$(pytest --cov=app --cov-report=term | grep TOTAL | awk '{print $4}')
        echo "coverage=$COV" >> "$GITHUB_OUTPUT"
      shell: bash

When to use which

Reusable Workflow Composite Action
Scope Entire job (with its own runs-on) Steps within a job
Runner Runs on its own runner Runs on the caller's runner
Secrets Can receive secrets via secrets: Inherits caller's environment
Nesting Can call other reusable workflows (max 4 deep) Can use other actions
Use when Standardizing deployment patterns across repos Packaging repeated step sequences
Think of it as A function that brings its own VM A function that runs in your VM

Mental Model: Reusable workflows are like microservices — self-contained, with their own environment. Composite actions are like library functions — they run inside your process. Use reusable workflows when you need isolation (deployments, security scans). Use composite actions when you need speed (no extra VM boot time).


Part 8: The Fork That Leaked Secrets

Time for a war story that will make you audit your workflows tonight.

War Story: In 2021, security researchers demonstrated an attack pattern they called "pwn requests." The target: workflows using pull_request_target that checked out the PR's code. Here's how it works.

A public repo has a workflow triggered on pull_request_target. Unlike the normal pull_request event, pull_request_target runs with the base branch's workflow file but gives the job write permissions and access to secrets. The intention is for safe operations like labeling PRs.

But the workflow also runs actions/checkout with ref: ${{ github.event.pull_request.head.sha }}, pulling in the fork's code. The attacker submits a PR that modifies a build script or package.json post-install hook to exfiltrate secrets. The workflow runs the attacker's code with full secrets access.

This pattern compromised repositories belonging to Apache, Eclipse, and multiple CNCF projects. GitHub's security blog documented it explicitly. The fix: never check out untrusted code in a pull_request_target workflow. Use pull_request_target only for metadata operations (labeling, commenting). Run builds with the pull_request event, which doesn't get secrets from forks. This is documented in GitHub's security hardening guide for Actions.

This is why the permissions key exists. Declare the minimum your workflow needs:

# Bad — gives write access to everything
permissions: write-all

# Good — explicit least privilege
permissions:
  contents: read
  packages: write     # only if pushing to GHCR
  id-token: write     # only if using OIDC

Gotcha: Before February 2023, the default GITHUB_TOKEN permissions were write-all. Repos created before that date still have the old default unless an admin changed it. Check your repo: Settings → Actions → General → Workflow permissions. New repos default to read-only, but inherited org settings may override this.


Part 9: Security Hardening Checklist

Here's the checklist. Print it. Tape it to your monitor. Review it before every workflow change.

## GitHub Actions Security Hardening

### Credentials
- [ ] Use OIDC for cloud auth — no static AWS/GCP/Azure keys in secrets
- [ ] Set minimum GITHUB_TOKEN permissions per job (not workflow-level write-all)
- [ ] Use environment protection rules for production deploys
- [ ] Rotate any remaining static secrets on a schedule

### Supply Chain
- [ ] Pin third-party actions to full SHA (not mutable tags)
      uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
- [ ] Enable Dependabot for GitHub Actions version updates
- [ ] Use actions only from trusted publishers or your own org
- [ ] Audit new actions before adding (read their source code)

### Fork Safety
- [ ] Never use pull_request_target + checkout of PR code
- [ ] Require approval for first-time contributors' workflow runs
- [ ] Don't use self-hosted runners on public repos

### Secrets Hygiene
- [ ] Never interpolate secrets in `run:` commands — use `env:` mapping
- [ ] Never echo or log secret values (even masked ones can leak via encoding)
- [ ] If a secret leaks, rotate immediately — logs may be cached externally

### Branch Protection
- [ ] Require status checks to pass before merge
- [ ] Require PR review (prevents direct pushes that skip CI)
- [ ] Enable branch protection for the default branch

War Story: In January 2024, the popular tj-actions/changed-files action was compromised via a supply chain attack. The attacker modified the v35 tag to inject code that exfiltrated CI secrets. Organizations using @v35 (a mutable tag) ran the compromised version. Organizations that pinned to a specific SHA were completely unaffected. This incident, along with the earlier codecov/codecov-action compromise in 2021, is why security teams now mandate SHA pinning for all third-party actions.


Part 10: Debugging Failed Workflows

Your workflow is failing. The logs are 2,000 lines of noise. Here's the systematic approach.

1. Start with gh CLI — faster than the web UI

# List recent failures
gh run list --status failure --limit 10

# View the failed run
gh run view 12345678

# View only the failed job's logs (skip the successful ones)
gh run view 12345678 --log-failed

# Rerun only the failed jobs (not the whole workflow)
gh run rerun 12345678 --failed

# Watch a running workflow in real-time
gh run watch 12345678

2. Enable debug logging

Add these repository secrets (not variables — secrets):

ACTIONS_RUNNER_DEBUG = true
ACTIONS_STEP_DEBUG = true

This dumps verbose runner and step output. Remove them when done — the logs get enormous.

3. Test locally with act

# Install act (runs workflows locally using Docker)
brew install act          # macOS
# or: curl -s https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash

# List available jobs
act -l

# Run a specific job
act push -j lint

# With a secrets file (.secrets, one per line: KEY=value)
act push -j test --secret-file .secrets

# Use a fuller runner image (default is minimal)
act push --platform ubuntu-latest=ghcr.io/catthehacker/ubuntu:act-latest

Gotcha: act doesn't perfectly replicate GitHub Actions. Services, OIDC, caching, and some GITHUB_* context variables don't work. It's excellent for testing shell commands and basic logic, but don't rely on it for caching or auth workflows. Think of it as a fast feedback loop for the 80% case.

4. The SSH debug escape hatch

When logs aren't enough, you can SSH into a running runner:

      - name: Debug via SSH
        if: failure()
        uses: mxschmitt/action-tmate@v3
        with:
          limit-access-to-actor: true   # only the person who triggered the run can connect

This pauses the workflow and gives you a tmate SSH URL. You get a shell on the actual runner with the full environment. Poke around, check environment variables, test commands interactively. Don't leave it in production workflows — it blocks the runner until the timeout.


Part 11: Common Workflow Patterns

CI — run on every PR

on:
  pull_request:
    branches: [main]
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true      # cancel stale runs when new commits push

CD — deploy on merge to main

on:
  push:
    branches: [main]
concurrency:
  group: deploy-production
  cancel-in-progress: false     # NEVER cancel in-progress deploys

Gotcha: cancel-in-progress: true on deployment workflows can leave your infrastructure in a half-migrated state. A database migration gets cancelled mid-flight, your schema is half-applied, and the app crashes. Use cancel-in-progress: true for CI (tests are safe to cancel). Use cancel-in-progress: false for anything that mutates state.

Scheduled — weekly dependency updates, stale cache warming

on:
  schedule:
    - cron: '0 6 * * 1'        # Monday 6am UTC

Release — triggered by publishing a GitHub Release

on:
  release:
    types: [published]
jobs:
  publish:
    steps:
      - run: echo "Publishing ${{ github.event.release.tag_name }}"

Part 12: Cost Management

GitHub Actions bills by the minute for private repos. Public repos get unlimited free minutes.

Runner type Cost per minute (private repos) vCPUs RAM
ubuntu-latest $0.008 2 7 GB
ubuntu-latest (4-core) $0.016 4 16 GB
macos-latest $0.08 3 14 GB
windows-latest $0.016 2 7 GB

macOS runners cost 10x Linux runners. Windows costs 2x. If your matrix tests macOS + Windows + Linux across 4 language versions, your monthly bill might surprise you.

Cost reduction tactics:

  • Path filters — skip CI for doc-only changes
  • cancel-in-progress: true on CI — don't test stale pushes
  • Caching — pip, npm, Docker layers, apt packages
  • Matrix exclusions — do you really need macOS CI on every PR, or just main?
  • Timeout limits — a stuck job at default 6 hours eats 6 × $0.008 = $0.048. Not much once, but a flaky workflow that hangs 10 times a week adds up.
  • Self-hosted runners — if you're spending >$500/month on Actions, self-hosted runners on your own infrastructure (or spot instances) often pay for themselves

Trivia: GitHub Actions provides unlimited free minutes for public repositories. This single decision in 2019 effectively killed Travis CI's business model — Travis had been the dominant CI platform for open source, offering free tiers that organizations applied for individually. GitHub made it automatic. Travis CI laid off most of its engineering team in 2020.


Part 13: The Complete Pipeline

Here's everything assembled — a production-grade workflow for the Python API we started with. Every annotation refers back to a section of this lesson.

name: CI/CD Pipeline

on:
  push:
    branches: [main]
    paths:
      - 'app/**'
      - 'tests/**'
      - 'requirements*.txt'
      - 'Dockerfile'
      - '.github/workflows/ci.yml'
  pull_request:
    branches: [main]

permissions:
  contents: read                          # Part 8: least privilege default

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # --- CI Gate ---
  lint:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'
      - run: pip install ruff
      - run: ruff check app/ tests/

  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    needs: lint                           # Part 2: dependency chain
    strategy:
      fail-fast: false                    # Part 3: see all failures at once
      matrix:
        python-version: ['3.10', '3.11', '3.12']
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'
      - run: pip install -r requirements.txt -r requirements-test.txt
      - run: pytest --cov=app --cov-fail-under=90 -v

  # --- Docker Build ---
  build-image:
    runs-on: ubuntu-latest
    needs: test                           # all matrix legs must pass
    if: github.ref == 'refs/heads/main'
    permissions:
      contents: read
      packages: write                     # Part 5: push to GHCR
    outputs:
      image-tag: ${{ steps.meta.outputs.version }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/metadata-action@v5
        id: meta
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # --- Deploy to Staging ---
  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-image
    if: github.ref == 'refs/heads/main'
    environment: staging                  # Part 6: environment protection
    permissions:
      id-token: write                     # Part 6: OIDC
      contents: read
    concurrency:
      group: deploy-staging
      cancel-in-progress: false
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-staging
          aws-region: us-east-1
      - name: Deploy to staging
        run: |
          aws ecs update-service \
            --cluster api-staging \
            --service api \
            --force-new-deployment

  # --- Deploy to Production ---
  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/main'
    environment: production               # requires manual approval in GitHub settings
    permissions:
      id-token: write
      contents: read
    concurrency:
      group: deploy-production
      cancel-in-progress: false           # Part 11: never cancel deploys
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-production
          aws-region: us-east-1
      - name: Deploy to production
        run: |
          aws ecs update-service \
            --cluster api-production \
            --service api \
            --force-new-deployment

  # --- Notify on Failure ---
  notify:
    runs-on: ubuntu-latest
    needs: [lint, test, build-image, deploy-staging, deploy-production]
    if: failure()                         # Part 4: status check function
    steps:
      - name: Send failure notification
        run: |
          curl -X POST "${{ secrets.SLACK_WEBHOOK }}" \
            -H 'Content-Type: application/json' \
            -d '{"text": "Pipeline failed for ${{ github.repository }}@${{ github.sha }}"}'

The flow

push to main
    |
    v
  [lint] ──fail──> stop
    |
    v (pass)
  [test: 3.10] ─┐
  [test: 3.11] ─┼─ any fail ──> stop
  [test: 3.12] ─┘
    |
    v (all pass)
  [build-image] ──> push to GHCR
    |
    v
  [deploy-staging] ──> OIDC auth ──> ECS update
    |
    v
  [deploy-production] ──> manual approval gate ──> OIDC auth ──> ECS update
    |
    v
   done (or [notify] on any failure)

Exercises

Exercise 1: Your first workflow (2 minutes)

Create .github/workflows/hello.yml in any repository:

name: Hello
on: [push]
jobs:
  greet:
    runs-on: ubuntu-latest
    steps:
      - run: echo "SHA is ${{ github.sha }}"

Push it. Watch it run in the Actions tab.

What to look for - The workflow appears in the Actions tab within seconds - The log shows the full SHA, not the literal string `${{ github.sha }}` - The runner is a fresh Ubuntu VM — check the "Set up job" step for the image version

Exercise 2: Add a matrix (10 minutes)

Take the hello workflow and make it run on three different Ubuntu versions: ubuntu-22.04, ubuntu-24.04, and ubuntu-latest.

Add a step that prints cat /etc/os-release so you can see the actual OS version for each.

Hint
strategy:
  matrix:
    os: [ubuntu-22.04, ubuntu-24.04, ubuntu-latest]
runs-on: ${{ matrix.os }}

Exercise 3: Build a secure pipeline (30 minutes)

Take the complete pipeline from Part 13 and adapt it for a Node.js application. You'll need to:

  1. Replace setup-python with setup-node
  2. Replace pytest with your test runner (jest, vitest, etc.)
  3. Replace ruff with eslint or biome
  4. Keep OIDC, matrix builds, and SHA-pinned actions

Bonus: add a step that runs npm audit and fails the build if high-severity vulnerabilities are found.

Key differences - `setup-node` uses `cache: 'npm'` (or `cache: 'pnpm'`) - Node.js has more OS-dependent behavior than Python — matrix testing across OS matters more - `npm audit --audit-level=high` exits non-zero on high/critical vulns - `npm ci` is the CI equivalent of `npm install` — deterministic, uses lockfile only

Cheat Sheet

What How
Trigger on push on: push: branches: [main]
Skip CI for docs on: push: paths: ['src/**']
Manual trigger on: workflow_dispatch:
Job dependency needs: [lint, test]
Matrix build strategy: matrix: python: ['3.10', '3.12']
See all failures strategy: fail-fast: false
Cache deps setup-python: with: cache: 'pip' or actions/cache@v4
Pass data between jobs outputs: + ${{ needs.job.outputs.key }}
OIDC auth permissions: id-token: write + aws-actions/configure-aws-credentials
Pin action to SHA uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
Cancel stale CI concurrency: group: ci-${{ github.ref }} + cancel-in-progress: true
Queue deploys concurrency: group: deploy-prod + cancel-in-progress: false
Run on failure only if: failure()
Run always if: always()
Environment gate environment: production (set approval rules in repo settings)
Debug a run gh run view <id> --log-failed
Rerun failures gh run rerun <id> --failed
Test locally act push -j build

Takeaways

  • OIDC over static keys, always. The single biggest security win for any CI/CD pipeline. No credentials to leak, no keys to rotate.

  • Pin third-party actions to SHAs. Tags are mutable pointers. A compromised tag hijacks every workflow that references it. SHAs are immutable.

  • Set fail-fast: false on test matrices. See all failures in one run, not three sequential "fix and retry" cycles.

  • Declare permissions explicitly. The default might be write-all if your repo predates February 2023. Least privilege isn't paranoia — it's the difference between a nuisance PR and a full repo compromise.

  • CI/CD lives in your repo, which means it's code. Review workflow changes like you review application code. A one-line YAML change can expose every secret in your org.

  • The environment feature is your production gate. Required reviewers, wait timers, and branch restrictions on environments prevent accidental (or malicious) deploys.