Skip to content

Portal | Level: L0: Entry | Domain: DevOps & Tooling

GrokDevOps - DevOps Learning Roadmap

Architecture Overview

┌─────────────────────────────────────────────────────────┐
                    Developer Workstation                                                                              git push ──► GitHub ──► GitHub Actions CI Pipeline                                                                                   ┌─────────┴──────────┐                                       CI Jobs                                                ┌──────────────┐                                         Lint & Test                                           Build Image                                           Scan & SBOM                                           Push to GHCR                                         └──────────────┘                                      └────────┬───────────┘                                                                                                                                                      ┌────────────────┐                                          Container                                               Registry                                                (GHCR)                                                └───────┬────────┘                                                                                            ┌────────────┼────────────┐                                                                                 ┌────────┐  ┌──────────┐  ┌────────┐                       Dev      Staging      Prod                         k3s       k3s/EKS     EKS                        └────────┘  └──────────┘  └────────┘                                                                               Helm charts deploy to each environment          └─────────────────────────────────────────────────────────┘

Pipeline Lifecycle

1. Code Change

Developer pushes code to GitHub.

2. CI Pipeline Triggers

GitHub Actions runs on push to main/develop or on pull requests:

Job Purpose
lint Ruff linter and formatter checks
test Pytest with coverage (70% floor)
validate Helm lint, Ansible syntax, YAML lint, Shellcheck
docker-build Multi-stage Docker build, push to GHCR
terraform Terraform fmt and validate
security Trivy vulnerability scan, SBOM
dependency-audit pip-audit on Python dependencies

3. Artifact Production

  • Container image pushed to GHCR with SHA-based tags
  • SBOM generated in SPDX format
  • Coverage report archived

4. Deployment (current: manual, future: GitOps)

  • Now: helm upgrade via script
  • Future: ArgoCD watches Git for image digest changes

Artifact Flow

Source Code
        Python Package (requirements.txt)
        Container Image (multi-stage Dockerfile)
        ├──► GHCR (tagged by git SHA)
    ├──► SBOM artifact
    └──► Vulnerability scan report
        Helm Release (values per environment)
        Running Pods in Kubernetes

Environment Promotion Model

┌─────────┐     digest      ┌───────────┐    approval     ┌─────────┐
│   Dev   │ ──────────────► │  Staging  │ ───────────────► │  Prod   │
│         │  auto-promote   │           │  manual gate     │         │
└─────────┘                 └───────────┘                  └─────────┘

 values-dev.yaml           values-staging.yaml           values-prod.yaml
 replicas: 1               replicas: 2                   replicas: 3
 NodePort                  ClusterIP                     ClusterIP + HPA
 debug logging             info logging                  warning logging

Promotion is based on image digest, not mutable tags. This ensures the exact image tested in staging is what runs in production.

Infrastructure Provisioning Plan

Local (Stage 1 - Now)

  • k3s single-node cluster
  • Ansible playbook bootstraps k3s
  • Helm deploys application

AWS (Stage 2 - Future)

  • Terraform provisions VPC, EKS, IAM
  • Managed node groups
  • IRSA for pod-level AWS permissions
  • GitHub Actions OIDC for keyless CI/CD auth
devops/terraform/
├── modules/
   ├── vpc/       # VPC, subnets, NAT
   ├── eks/       # EKS cluster, node groups
   └── iam/       # Roles, policies, OIDC
└── environments/
    ├── dev/
    ├── staging/
    └── prod/

Expansion Stages

Stage 1: Foundation (Implemented)

Skills learned: - Python web service development - Container image building (multi-stage) - GitHub Actions CI pipeline design - Kubernetes deployment with Helm - Container security scanning - SBOM generation - Dependency auditing - Git-based workflow

What you have: - FastAPI service with health/version endpoints - Production Dockerfile (multi-stage, non-root, healthcheck) - CI pipeline (lint → test → build → scan) - Helm chart with per-environment values - Local deployment scripts for k3s - Terraform and Ansible scaffolding

Stage 2: Platform Engineering (Scaffold Ready)

Skills to learn: - GitOps with ArgoCD or Flux - Infrastructure as Code with Terraform - AWS networking (VPC, subnets, NAT) - Managed Kubernetes (EKS) - IAM and security boundaries - Secrets management (External Secrets Operator) - Configuration management with Ansible

Implementation path: 1. Install ArgoCD on k3s 2. Create Kustomize overlays 3. Implement Terraform VPC module 4. Implement Terraform EKS module 5. Set up GitHub OIDC for AWS auth 6. Add External Secrets Operator

Stage 3: Production Patterns (Documentation)

Skills to learn: - Blue/green deployments - Canary releases (Argo Rollouts) - Horizontal Pod Autoscaling - Pod Disruption Budgets - Distributed worker systems - Queue-based architectures - Node upgrade strategies - Chaos engineering (Litmus, Chaos Mesh)

Implementation path: 1. Replace Deployment with Argo Rollout 2. Configure analysis templates for canary 3. Add HPA manifests 4. Create worker deployment with queue consumer 5. Implement node drain/upgrade playbooks 6. Add chaos experiments

Observability (Implemented)

Metrics: Application -> Prometheus -> Grafana

FastAPI (/metrics endpoint)
ServiceMonitor (selects app Service)
Prometheus (scrapes every 30s)
Grafana (dashboards, alerts)

Metrics exposed: http_requests_total, http_request_duration_seconds

Logging: Application -> Promtail -> Loki -> Grafana

Application (stdout/stderr)
Promtail (DaemonSet, collects container logs)
Loki (stores and indexes by labels)
Grafana (LogQL queries)

Tracing: Application -> Tempo -> Grafana

Application (OTLP exporter, future)
Tempo (trace storage)
Grafana (trace visualization)

Install with: ./devops/scripts/install-observability.sh See observability.md for details.

Security Pipeline (Expanding)

Implemented

  • Container vulnerability scanning (Trivy)
  • SBOM generation (Anchore/Syft)
  • Dependency auditing (pip-audit)

Stage 2 Additions

  • Image signing with Cosign/Sigstore
  • Admission control with Kyverno or OPA Gatekeeper
  • Network policies
  • RBAC hardening
  • Secret rotation

Stage 3 Additions

  • Runtime security (Falco)
  • eBPF-based monitoring
  • Supply chain security (SLSA)
  • Penetration testing automation

Quick Reference

Run the CI pipeline locally

# Install dev dependencies
pip install -r requirements-dev.txt

# Lint
ruff check .
ruff format --check .

# Test
pytest --cov=app

# Build image
./devops/docker/build.sh

Deploy to local k3s

# Full workflow
./devops/scripts/workflow.sh

# Or step by step
./devops/docker/build.sh
./devops/scripts/deploy-local.sh

# Access
kubectl port-forward -n grokdevops svc/grokdevops 8000:80
curl http://localhost:8000/health

Tear down

./devops/scripts/deploy-local.sh --uninstall