Skip to content

Portal | Level: L1: Foundations | Topics: CI/CD, CI/CD Pipelines Realities, CI/CD Patterns | Domain: DevOps & Tooling

CI/CD Pipelines - Primer

Why This Matters

Name origin: The term "Continuous Integration" was coined by Grady Booch in 1991 and popularized by Kent Beck's Extreme Programming (XP) in 1999. Martin Fowler's 2006 article "Continuous Integration" became the canonical reference. "Continuous Delivery" was formalized by Jez Humble and David Farley in their 2010 book of the same name. The key distinction: CI is about keeping the codebase always buildable; CD is about keeping it always deployable.

CI/CD (Continuous Integration / Continuous Delivery) is the backbone of modern software delivery. It automates building, testing, and deploying code so teams can ship reliably and frequently. As a DevOps engineer, you'll design, build, maintain, and troubleshoot these pipelines daily.

Core Concepts

Continuous Integration (CI)

Every code change is automatically built and tested when pushed. The goal: catch problems early, keep the main branch always in a deployable state.

CI typically includes: 1. Code checkout 2. Dependency installation 3. Linting / static analysis 4. Unit tests 5. Build artifacts (binaries, container images) 6. Integration tests

Continuous Delivery (CD)

Automatically deploy every change that passes CI to staging (and optionally production with manual approval). Continuous Deployment goes further: every passing change goes to production automatically, no human gate.

Code Push -> Build -> Test -> Stage Deploy -> [Manual Approval] -> Prod Deploy
             ^                                                      ^
         Continuous Integration                          Continuous Delivery
         |<---------------------------------------------------->|
                        Continuous Deployment (if fully automated)

Pipeline Anatomy (GitHub Actions)

GitHub Actions is the most common CI/CD platform for GitHub-hosted projects. A workflow is defined in .github/workflows/*.yml.

name: CI Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run linter
        run: make lint

  test:
    runs-on: ubuntu-latest
    needs: lint              # Runs after lint passes
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: make test

  build:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: docker build -t ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} .
      - name: Push image
        run: docker push ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

  deploy-staging:
    runs-on: ubuntu-latest
    needs: build
    environment: staging     # Can have protection rules
    steps:
      - name: Deploy to staging
        run: ./deploy.sh staging ${{ github.sha }}

  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment: production  # Requires manual approval
    steps:
      - name: Deploy to production
        run: ./deploy.sh production ${{ github.sha }}

Key Components

Triggers (on:)

on:
  push:
    branches: [main]
    paths: ['src/**']         # Only trigger if src/ changes
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'      # Weekly on Monday at 6am
  workflow_dispatch:           # Manual trigger button

Jobs and Steps - A job runs on a fresh VM (runner). Jobs run in parallel by default. - needs: creates dependencies between jobs (sequential execution). - Each step is either a shell command (run:) or a reusable action (uses:).

Runners - ubuntu-latest, windows-latest, macos-latest - GitHub-hosted runners - Self-hosted runners: your own machines for private networks, special hardware, or cost savings

Secrets

steps:
  - name: Deploy
    env:
      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    run: aws s3 sync ./dist s3://my-bucket
Secrets are stored in GitHub Settings, encrypted at rest, masked in logs. Never hardcode secrets in workflow files.

Artifacts

# Upload
- uses: actions/upload-artifact@v4
  with:
    name: build-output
    path: dist/

# Download in another job
- uses: actions/download-artifact@v4
  with:
    name: build-output

Caching

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ hashFiles('package-lock.json') }}
    restore-keys: npm-
Caching dependencies dramatically speeds up pipelines. Cache key should change when dependencies change.

Environments Environments (staging, production) can have: - Required reviewers (manual approval gate) - Wait timers - Deployment branch restrictions - Environment-specific secrets

Matrix Builds

Test across multiple versions/platforms:

jobs:
  test:
    strategy:
      matrix:
        node-version: [18, 20, 22]
        os: [ubuntu-latest, windows-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm test

Reusable Workflows

DRY principle for pipelines:

# .github/workflows/reusable-deploy.yml
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
    secrets:
      deploy-key:
        required: true

# Caller workflow
jobs:
  deploy:
    uses: ./.github/workflows/reusable-deploy.yml
    with:
      environment: staging
    secrets:
      deploy-key: ${{ secrets.DEPLOY_KEY }}

Deployment Strategies

Rolling Update

Replace instances one at a time. The default in Kubernetes.

Time 0: [v1] [v1] [v1] [v1]
Time 1: [v2] [v1] [v1] [v1]   ← first pod updated
Time 2: [v2] [v2] [v1] [v1]
Time 3: [v2] [v2] [v2] [v1]
Time 4: [v2] [v2] [v2] [v2]   ← complete
  • Pros: Zero downtime, minimal resource overhead, built into most orchestrators.
  • Cons: Both versions serve traffic simultaneously during rollout. If v2 is incompatible with v1 (DB schema change), you get errors.
  • Rollback: Reverse the rolling update. Takes time proportional to fleet size.

Blue-Green

Run two identical environments. Switch traffic all at once.

          ┌──── Load Balancer ────┐
          │                       │
    [Blue: v1] ←── live     [Green: v2] ←── staging
          │                       │
    (switch)                      │
          │                       │
    [Blue: v1] ←── idle     [Green: v2] ←── live
  • Pros: Instant switchover. Instant rollback (switch back).
  • Cons: Requires 2x infrastructure. Database migrations must be backward-compatible.
  • Rollback: Point the load balancer back to blue.

Canary

Send a small percentage of traffic to the new version. Monitor. Gradually increase.

Time 0:  v1 ████████████████████ 100%    v2 ░ 0%
Time 1:  v1 ██████████████████░░  95%    v2 █ 5%
Time 2:  v1 ████████████████░░░░  80%    v2 ████ 20%
Time 3:  v1 ██████████░░░░░░░░░░  50%    v2 ██████████ 50%
Time 4:  v1 ░░░░░░░░░░░░░░░░░░░░   0%    v2 ████████████████████ 100%
  • Pros: Limits blast radius. Real production traffic validates the new version.
  • Cons: Requires traffic splitting (service mesh, load balancer rules). Needs monitoring and rollback criteria.
  • Rollback: Route 100% back to v1. Fast.

Recreate

Kill all old instances, start new ones. Simplest strategy.

  • Pros: No version mixing. Simple.
  • Cons: Downtime between shutdown and startup.
  • Use when: Downtime is acceptable (batch jobs, internal tools, dev environments).

Artifact Management

Build once, deploy many times. The artifact from CI should be the exact same artifact deployed to dev, staging, and prod.

CI Build → [Container Registry]
           Deploy to dev    (myapp:abc123)
           Deploy to staging (myapp:abc123)  ← same image
           Deploy to prod    (myapp:abc123)  ← same image

Tag artifacts with the Git SHA. Avoid latest for anything beyond dev -- it is ambiguous and unreproducible.

Default trap: The latest tag on container images is not "the most recent build." It is simply the default tag Docker applies when you do not specify one. If two developers push latest from different branches, the second one silently overwrites the first. In production, latest is a debugging nightmare because you cannot tell which commit is running. Always use Git SHA or semantic version tags.

Environment Promotion

dev → staging → prod

Each environment should differ only in configuration (secrets, endpoints, replica counts), not in code or artifacts.

# values-dev.yaml
replicas: 1
database_url: postgres://dev-db:5432/app
log_level: debug

# values-prod.yaml
replicas: 5
database_url: postgres://prod-db:5432/app
log_level: warn

Feature Flags

Decouple deployment from feature release. Ship code to production behind a flag. Enable the flag when ready.

if feature_flags.is_enabled("new_checkout_flow", user=current_user):
    return new_checkout(cart)
else:
    return old_checkout(cart)

This lets you deploy incomplete features (dormant behind flag), canary to specific users, kill a feature instantly without a deploy, and run A/B experiments.

Tools: LaunchDarkly, Unleash, Flagsmith, or a simple config file for basic use cases.

Trunk-Based Development vs Gitflow

Trunk-Based Development: Everyone commits to main. Feature branches are short-lived (< 1 day). CI runs on every commit to main. Works well with feature flags and CI/CD. Requires good test coverage.

Gitflow: Long-lived develop and main branches. Feature branches merge to develop. Release branches cut from develop. Works for software with discrete releases (mobile apps, on-prem). Heavy overhead for web services with continuous deployment.

Merge Queues: Serialize merges to main, running CI on each merge candidate with the latest main. Prevents "merge skew" where two PRs pass CI individually but break when combined.

Pipeline Security

OIDC for Cloud Access

Do not store long-lived cloud credentials as CI secrets. Use OIDC to exchange a short-lived CI token for cloud credentials.

# GitHub Actions with AWS OIDC
permissions:
  id-token: write
  contents: read

steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789012:role/ci-deploy
      aws-region: us-east-1

Pin Action Versions

# Bad — mutable tag, supply chain risk
- uses: actions/checkout@v4

# Good — pinned to exact SHA
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11

Testing Pyramid in CI

         /  E2E  \          Few, slow, expensive
        /----------\
       / Integration \      Medium count, medium speed
      /----------------\
     /    Unit Tests    \   Many, fast, cheap
    /--------------------\

CI should run all three tiers, but fail fast — unit tests first.

Rollback Strategies

Revert Deploy: Deploy the previous known-good version.

kubectl rollout undo deployment/myapp   # Kubernetes
helm rollback myapp 1                    # Helm

Feature Flag Kill Switch: Disable the flag. Code stays deployed but the feature is off.

Forward Fix: Fix the bug and deploy again. Faster than rollback when the fix is obvious and the test suite is fast.

Database Rollback: The hardest part. Migrations should always be backward-compatible. Use the expand-and-contract pattern.

Remember: Mnemonic for the expand-and-contract pattern: "Add, Migrate, Remove" (AMR). Step 1: Add the new column/table alongside the old one. Step 2: Migrate code to use the new schema (deploy new code). Step 3: Remove the old column/table only after all code uses the new one. Never modify or delete a column that running code depends on.

What Experienced People Know

  • Pipeline speed matters. A 30-minute pipeline kills developer productivity. Optimize caching, parallelize jobs, and skip unnecessary steps.
  • Flaky tests erode trust in the pipeline. Track and fix them aggressively. A test that fails randomly trains people to ignore failures.
  • Pin action versions to a SHA, not a tag: uses: actions/checkout@abc123 is safer than @v4 because tags can be moved.
  • Secrets in CI are a supply-chain risk. Minimize the number of secrets, use short-lived credentials (OIDC), and audit who has access.
  • The pipeline IS infrastructure. Treat workflow files with the same rigor as production code: review PRs, test changes, version control everything.
  • Build once, deploy many. Same artifact goes to staging and production.
  • A 45-minute green pipeline is almost as bad as a broken one. Developers batch changes and skip CI.
  • Rollback should be one-click. Practice it regularly.
  • Monitor pipeline metrics: pass rate, duration P50/P95, flaky test count, time-to-deploy.
  • When the pipeline breaks, fix it before merging anything else. Broken main blocks everyone.

Analogy: A CI/CD pipeline is a factory assembly line. Each stage (lint, test, build, deploy) is a station. If one station breaks, the whole line stops. The investment in keeping the line running pays for itself: Google's internal data shows that teams with reliable CI/CD deploy 208x more frequently with 2,604x faster recovery times (per the DORA State of DevOps reports).


Wiki Navigation

Prerequisites

Next Steps