Portal | Level: L0: Entry | Domain: DevOps & Tooling
GrokDevOps - DevOps Learning Roadmap¶
Architecture Overview¶
┌─────────────────────────────────────────────────────────┐
│ Developer Workstation │
│ │
│ git push ──► GitHub ──► GitHub Actions CI Pipeline │
│ │ │
│ ┌─────────┴──────────┐ │
│ │ CI Jobs │ │
│ │ ┌──────────────┐ │ │
│ │ │ Lint & Test │ │ │
│ │ │ Build Image │ │ │
│ │ │ Scan & SBOM │ │ │
│ │ │ Push to GHCR │ │ │
│ │ └──────────────┘ │ │
│ └────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Container │ │
│ │ Registry │ │
│ │ (GHCR) │ │
│ └───────┬────────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────┐ ┌──────────┐ ┌────────┐ │
│ │ Dev │ │ Staging │ │ Prod │ │
│ │ k3s │ │ k3s/EKS │ │ EKS │ │
│ └────────┘ └──────────┘ └────────┘ │
│ │
│ Helm charts deploy to each environment │
└─────────────────────────────────────────────────────────┘
Pipeline Lifecycle¶
1. Code Change¶
Developer pushes code to GitHub.
2. CI Pipeline Triggers¶
GitHub Actions runs on push to main/develop or on pull requests:
| Job | Purpose |
|---|---|
lint |
Ruff linter and formatter checks |
test |
Pytest with coverage (70% floor) |
validate |
Helm lint, Ansible syntax, YAML lint, Shellcheck |
docker-build |
Multi-stage Docker build, push to GHCR |
terraform |
Terraform fmt and validate |
security |
Trivy vulnerability scan, SBOM |
dependency-audit |
pip-audit on Python dependencies |
3. Artifact Production¶
- Container image pushed to GHCR with SHA-based tags
- SBOM generated in SPDX format
- Coverage report archived
4. Deployment (current: manual, future: GitOps)¶
- Now:
helm upgradevia script - Future: ArgoCD watches Git for image digest changes
Artifact Flow¶
Source Code
│
▼
Python Package (requirements.txt)
│
▼
Container Image (multi-stage Dockerfile)
│
├──► GHCR (tagged by git SHA)
├──► SBOM artifact
└──► Vulnerability scan report
│
▼
Helm Release (values per environment)
│
▼
Running Pods in Kubernetes
Environment Promotion Model¶
┌─────────┐ digest ┌───────────┐ approval ┌─────────┐
│ Dev │ ──────────────► │ Staging │ ───────────────► │ Prod │
│ │ auto-promote │ │ manual gate │ │
└─────────┘ └───────────┘ └─────────┘
values-dev.yaml values-staging.yaml values-prod.yaml
replicas: 1 replicas: 2 replicas: 3
NodePort ClusterIP ClusterIP + HPA
debug logging info logging warning logging
Promotion is based on image digest, not mutable tags. This ensures the exact image tested in staging is what runs in production.
Infrastructure Provisioning Plan¶
Local (Stage 1 - Now)¶
- k3s single-node cluster
- Ansible playbook bootstraps k3s
- Helm deploys application
AWS (Stage 2 - Future)¶
- Terraform provisions VPC, EKS, IAM
- Managed node groups
- IRSA for pod-level AWS permissions
- GitHub Actions OIDC for keyless CI/CD auth
devops/terraform/
├── modules/
│ ├── vpc/ # VPC, subnets, NAT
│ ├── eks/ # EKS cluster, node groups
│ └── iam/ # Roles, policies, OIDC
└── environments/
├── dev/
├── staging/
└── prod/
Expansion Stages¶
Stage 1: Foundation (Implemented)¶
Skills learned: - Python web service development - Container image building (multi-stage) - GitHub Actions CI pipeline design - Kubernetes deployment with Helm - Container security scanning - SBOM generation - Dependency auditing - Git-based workflow
What you have: - FastAPI service with health/version endpoints - Production Dockerfile (multi-stage, non-root, healthcheck) - CI pipeline (lint → test → build → scan) - Helm chart with per-environment values - Local deployment scripts for k3s - Terraform and Ansible scaffolding
Stage 2: Platform Engineering (Scaffold Ready)¶
Skills to learn: - GitOps with ArgoCD or Flux - Infrastructure as Code with Terraform - AWS networking (VPC, subnets, NAT) - Managed Kubernetes (EKS) - IAM and security boundaries - Secrets management (External Secrets Operator) - Configuration management with Ansible
Implementation path: 1. Install ArgoCD on k3s 2. Create Kustomize overlays 3. Implement Terraform VPC module 4. Implement Terraform EKS module 5. Set up GitHub OIDC for AWS auth 6. Add External Secrets Operator
Stage 3: Production Patterns (Documentation)¶
Skills to learn: - Blue/green deployments - Canary releases (Argo Rollouts) - Horizontal Pod Autoscaling - Pod Disruption Budgets - Distributed worker systems - Queue-based architectures - Node upgrade strategies - Chaos engineering (Litmus, Chaos Mesh)
Implementation path: 1. Replace Deployment with Argo Rollout 2. Configure analysis templates for canary 3. Add HPA manifests 4. Create worker deployment with queue consumer 5. Implement node drain/upgrade playbooks 6. Add chaos experiments
Observability (Implemented)¶
Metrics: Application -> Prometheus -> Grafana¶
FastAPI (/metrics endpoint)
│
▼
ServiceMonitor (selects app Service)
│
▼
Prometheus (scrapes every 30s)
│
▼
Grafana (dashboards, alerts)
Metrics exposed: http_requests_total, http_request_duration_seconds
Logging: Application -> Promtail -> Loki -> Grafana¶
Application (stdout/stderr)
│
▼
Promtail (DaemonSet, collects container logs)
│
▼
Loki (stores and indexes by labels)
│
▼
Grafana (LogQL queries)
Tracing: Application -> Tempo -> Grafana¶
Install with: ./devops/scripts/install-observability.sh
See observability.md for details.
Security Pipeline (Expanding)¶
Implemented¶
- Container vulnerability scanning (Trivy)
- SBOM generation (Anchore/Syft)
- Dependency auditing (pip-audit)
Stage 2 Additions¶
- Image signing with Cosign/Sigstore
- Admission control with Kyverno or OPA Gatekeeper
- Network policies
- RBAC hardening
- Secret rotation
Stage 3 Additions¶
- Runtime security (Falco)
- eBPF-based monitoring
- Supply chain security (SLSA)
- Penetration testing automation
Quick Reference¶
Run the CI pipeline locally¶
# Install dev dependencies
pip install -r requirements-dev.txt
# Lint
ruff check .
ruff format --check .
# Test
pytest --cov=app
# Build image
./devops/docker/build.sh
Deploy to local k3s¶
# Full workflow
./devops/scripts/workflow.sh
# Or step by step
./devops/docker/build.sh
./devops/scripts/deploy-local.sh
# Access
kubectl port-forward -n grokdevops svc/grokdevops 8000:80
curl http://localhost:8000/health