Skip to content

Portal | Level: L1: Foundations | Topics: AI Tools for DevOps | Domain: DevOps & Tooling

AI-Assisted DevOps Cookbook

Concrete recipes for common DevOps tasks using AI tools (ChatGPT, Codex, Claude Code). Each recipe includes the prompt, what to expect, and what to watch out for.

Recipes


1. Generate a Terraform Module From Scratch

Tool: ChatGPT or Claude Code Time saved: 30-60 min

Prompt:

Generate a Terraform module for an AWS RDS PostgreSQL instance.

Requirements:
- PostgreSQL 15, db.t3.medium (parameterized)
- Multi-AZ for production, single-AZ for dev (variable toggle)
- Encrypted at rest with KMS (key ARN as variable)
- Automated backups, 7-day retention
- Private subnet group (subnet IDs as variable)
- Security group allowing inbound 5432 from a provided CIDR list
- Parameter group with: log_statement=all, log_min_duration_statement=1000
- Outputs: endpoint, port, security group ID

Variables should have descriptions and sensible defaults where appropriate.
Include a versions.tf with required_providers (aws >= 5.0).

Watch out for: - AI may use deprecated resource arguments - check against current AWS provider docs - Security groups may be too permissive - verify CIDR blocks and egress rules - Check that deletion_protection is enabled by default - Verify the parameter group family matches the engine version


2. Write a GitHub Actions Workflow

Tool: ChatGPT or Claude Code Time saved: 20-40 min

Prompt:

Write a GitHub Actions workflow called "CI" that runs on push to main
and on pull requests. Steps:

1. Checkout code
2. Set up Python 3.11
3. Cache pip dependencies (hash requirements.txt for cache key)
4. Install dependencies from requirements.txt
5. Run ruff check .
6. Run ruff format --check .
7. Run pytest --cov=app --cov-fail-under=70 --tb=short -q
8. Upload coverage report as artifact

Use ubuntu-latest runner. Pin action versions to specific SHAs
for supply chain security (not @v4, use @<sha>).

Watch out for: - AI rarely pins to SHAs by default - you'll likely need to follow up - Check that cache paths match your actual pip cache location - Verify the workflow has appropriate permissions: set (least privilege) - Make sure it doesn't use actions/checkout@v2 (old version)


3. Debug a Kubernetes CrashLoopBackOff

Tool: Claude Code (best - can run kubectl) or ChatGPT (paste logs)

Claude Code prompt:

My pod app/order-api is in CrashLoopBackOff. Help me debug it.
Run kubectl commands to check:
1. Pod events and status
2. Container logs (current and previous)
3. Resource limits vs actual usage
4. ConfigMap/Secret mounts
5. Liveness/readiness probe config

Diagnose the issue and suggest a fix.

ChatGPT prompt (if you can't use Claude Code):

My pod is in CrashLoopBackOff. Here's the output of:

kubectl describe pod order-api-7d8f9c6b4-xk2lm:
[paste output]

kubectl logs order-api-7d8f9c6b4-xk2lm --previous:
[paste output]

What's the root cause and how do I fix it?

Watch out for: - Don't run kubectl commands against production without thinking first - AI may suggest kubectl delete pod as a fix - that's a band-aid, not a fix - Check if the issue is in the app code, the config, or the infrastructure


4. Convert Docker Compose to Kubernetes Manifests

Tool: ChatGPT or Claude Code Time saved: 45-90 min

Prompt:

Convert this docker-compose.yml to Kubernetes manifests.
Create separate YAML files for each resource.

For each service, generate:
- Deployment with resource requests/limits
- Service (ClusterIP for internal, LoadBalancer for external)
- ConfigMap for environment variables
- PersistentVolumeClaim for any volumes

Don't use kompose - write clean, idiomatic Kubernetes YAML.
Add standard labels (app, component, version).
Set securityContext: runAsNonRoot where possible.

[paste docker-compose.yml]

Watch out for: - Volume mounts need real StorageClass names for your cluster - AI won't know your ingress controller - you'll need to adapt Services - Environment variables with secrets should use Secret, not ConfigMap - Resource limits are AI guesses - adjust based on actual usage data


5. Write an Incident Postmortem

Tool: ChatGPT Time saved: 30-45 min

Prompt:

Write a blameless postmortem for the following incident.
Use our template: Summary, Detection, Timeline, Root Cause,
Impact, What Went Well, What Went Wrong, Action Items.

Facts:
- Date: 2025-11-15, 09:30-10:45 UTC
- Service: checkout-api (EKS, us-east-1)
- Symptoms: HTTP 500 errors on /api/checkout, 30% failure rate
- Detection: Datadog alert on error rate > 5% fired at 09:35
- On-call acknowledged at 09:38, began investigating
- 09:42: Identified correlation with deploy at 09:28
- 09:50: Attempted rollback, but ArgoCD sync was stuck
- 10:00: Manually reverted the Helm release with helm rollback
- 10:15: Error rate returned to baseline
- 10:45: Confirmed all queued orders were processed
- Root cause: New env var PAYMENT_GATEWAY_URL was required
  but not added to the staging ConfigMap, causing nil pointer
  in payment client initialization
- Customer impact: ~450 failed checkout attempts, $12K estimated
  lost revenue

Action items should be specific, assigned (use placeholder names),
and have priority (P1/P2/P3).


6. Create a Helm Values Diff Report

Tool: Claude Code Time saved: 15-20 min

Prompt:

Compare values-dev.yaml, values-staging.yaml, and values-prod.yaml
in devops/helm/. Create a summary table showing the differences
across environments. Flag any values that look wrong:
- Prod with fewer replicas than staging
- Dev with production-like resource limits
- Missing values in any environment
- Security settings that differ unexpectedly


7. Generate Ansible Role From Existing Commands

Tool: ChatGPT or Claude Code Time saved: 30-60 min

Prompt:

I currently run these commands manually to set up a new app server.
Convert this into an Ansible role called 'app-server'.

Commands I run:
apt update && apt upgrade -y
apt install -y nginx python3.11 python3.11-venv certbot
useradd -m -s /bin/bash deploy
mkdir -p /opt/app /var/log/app
chown deploy:deploy /opt/app /var/log/app
cp nginx.conf /etc/nginx/sites-available/app
ln -s /etc/nginx/sites-available/app /etc/nginx/sites-enabled/
systemctl enable --now nginx
ufw allow 22/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw enable

Make it idempotent. Use handlers for service restarts.
Template the nginx config with variables for server_name and upstream_port.
Add a defaults/main.yml with sensible defaults.


8. Security Audit a Dockerfile

Tool: Claude Code or ChatGPT Time saved: 15-30 min

Prompt:

Audit this Dockerfile for security issues. Check against:
1. Running as root
2. Using latest tags
3. Unnecessary packages installed
4. Secrets in build args or env
5. Missing health check
6. Excessive file permissions
7. Not using multi-stage build
8. Including dev dependencies in production image
9. Missing .dockerignore considerations
10. Base image CVE exposure

Rate each finding as Critical/High/Medium/Low.
Provide the fixed Dockerfile after your review.

[paste Dockerfile]


Tips for All Recipes

  1. Always review before applying: No recipe output should go straight to production
  2. Iterate: First pass is a starting point. Follow up with "now also handle X"
  3. Add your context: These prompts are templates - add your specific stack details
  4. Save what works: When a prompt gives great results, save it for reuse
  5. Version your prompts: As tools evolve, update your prompt templates

Wiki Navigation

Prerequisites