Portal | Level: L1: Foundations | Topics: AI Tools for DevOps | Domain: DevOps & Tooling
AI Tools for DevOps - Primer¶
Why This Matters¶
AI tools are reshaping how DevOps engineers write code, debug infrastructure, and handle incidents. Knowing how to use ChatGPT, Claude Code, GitHub Copilot, and Codex effectively is becoming as fundamental as knowing kubectl or terraform. The engineers who learn to prompt well ship faster and debug harder problems.
The AI Tool Landscape for DevOps¶
ChatGPT (OpenAI)¶
Conversational AI accessed via browser or app. Best for: - Explaining concepts and troubleshooting errors - Drafting documentation, runbooks, and postmortems - Brainstorming architecture and comparing approaches - Learning new tools ("How does ArgoCD handle rollbacks?")
Key features for DevOps: - Custom instructions: Set persistent context about your stack - File uploads: Share config files, logs, error output for analysis - Custom GPTs: Build specialized assistants (IaC reviewer, incident commander) - Projects: Group related conversations with pinned reference files
ChatGPT Codex (OpenAI)¶
Agentic coding tool that works asynchronously in a sandboxed cloud environment: - Reads your repository, creates branches, writes code, runs tests - Generates pull requests with descriptions - Works in the background (queue tasks and come back) - Sandboxed - can't accidentally affect production
Best for: bulk refactoring, adding tests across a repo, repetitive multi-file changes.
Claude Code (Anthropic)¶
Terminal-native AI agent that runs in your shell: - Direct filesystem access - reads and edits your files - Runs commands with your permission (kubectl, terraform, docker, etc.) - Git integration - commits, pushes, creates PRs - Reads CLAUDE.md for project-specific context and conventions - Permission model - you approve or deny each action
Best for: interactive debugging, learning a codebase, one-off scripts, incident response.
GitHub Copilot¶
AI code completion integrated into your editor (VS Code, JetBrains, Neovim): - Inline suggestions as you type - Understands file context and project structure - Copilot Chat for conversational help within the editor - Works well for Terraform, YAML, Python, Dockerfiles, shell scripts
Best for: writing code faster, boilerplate generation, completing patterns.
Gotcha: AI-generated Terraform and IAM policies almost always err on the side of overly permissive access. A common pattern: you ask for "an IAM policy to access S3" and get
s3:*on*. Always review generated IAM for least-privilege: replace wildcard actions with specific ones (s3:GetObject,s3:PutObject), and scope resources to specific ARNs rather than*.Remember: The AI tool selection heuristic: "If it needs your files, use Claude Code. If it needs your editor, use Copilot. If it needs a conversation, use ChatGPT. If it needs to run autonomously across many files, use Codex." Match the tool to the interaction pattern, not the task category.
Prompt Engineering Fundamentals¶
Core Principles¶
1. Be Specific and Contextual
Bad:
"Write a Dockerfile"
Good:
"Write a multi-stage Dockerfile for a Python 3.11 FastAPI application. The production image should use python:3.11-slim, run as a non-root user (UID 1000), expose port 8000, and use uvicorn as the entrypoint."
2. Provide Examples (Few-Shot Prompting)
When you want output in a specific format, show the AI what you expect:
"Convert these environment variables to Kubernetes ConfigMap entries. Example: Input:
Now convert these: APP_PORT=8000, LOG_LEVEL=info, CACHE_TTL=300"DATABASE_URL=postgres://localhost:5432/mydbOutput:
3. Use Role/Persona Framing
"You are a senior SRE with 10 years of experience in Kubernetes and AWS. Review this Helm chart and identify potential issues with resource limits, security contexts, and pod disruption budgets."
4. Chain of Thought
Ask the model to reason step-by-step for complex problems:
"I'm getting OOMKilled pods in my Kubernetes cluster. Walk me through a systematic debugging approach, step by step, including what commands to run and what to look for at each stage."
5. Constrain the Output
"Write a bash script that checks disk usage on /var/log. Requirements: - Alert if usage exceeds 80% - Output must be a single function - No external dependencies - Include error handling for missing directories - Keep it under 30 lines"
Advanced Techniques¶
System Prompts / Custom Instructions
Set persistent context so you don't repeat yourself every conversation:
"I'm a DevOps engineer working with:
- AWS (EKS, RDS, S3, IAM)
- Terraform 1.5+ for IaC
- Helm 3 for Kubernetes deployments
- Python 3.11 for tooling and services
- GitHub Actions for CI/CD
When generating IaC:
- Use variables for all configurable values
- Include tags on every resource
- Follow least-privilege for IAM policies"
Prompt Chaining (Multi-Step Workflows)
Break complex tasks into a sequence:
- "Read this Terraform state file and list all resources and dependencies."
- "Design a migration plan to split this into three state files: networking, compute, data."
- "Generate the
terraform state mvcommands for phase 1." - "Write a validation script to verify resources after migration."
Negative Prompting
Specify what you don't want:
"Write a GitHub Actions workflow for Python CI. Do NOT:
- Use self-hosted runners
- Cache Docker layers
- Include deployment steps
- Use matrix builds"
Asking for Tradeoffs
"Compare these secrets management approaches for EKS:
1. AWS Secrets Manager + External Secrets Operator
2. HashiCorp Vault + CSI driver
3. Sealed Secrets
For each: setup complexity, operational overhead, cost at scale, failure modes."
DevOps-Specific Prompt Patterns¶
Infrastructure as Code Generation¶
"Generate a Terraform module for an AWS VPC with:
- CIDR block parameterized as a variable
- 3 public subnets and 3 private subnets across AZs
- NAT gateway in each public subnet
- Flow logs enabled to CloudWatch
- Tags: Environment and Project as variables
- Output the VPC ID, subnet IDs, and NAT gateway IDs
Use Terraform 1.5+ syntax with required_providers block."
Incident Debugging¶
"Here is the error from my application logs:
[paste error]
Context:
- Running on EKS 1.28
- Python 3.11 with FastAPI
- Using RDS PostgreSQL 15
- This started happening after deploying version 2.4.1
What are the most likely causes? List in order of probability
with a diagnostic command for each."
Code Review Assistance¶
"Review this Python function for:
1. Security vulnerabilities (OWASP Top 10)
2. Performance issues
3. Error handling gaps
4. Python best practices
Format as a numbered list with severity (high/medium/low) and suggested fix."
Pipeline/CI-CD Help¶
"Write a GitHub Actions workflow that:
- Triggers on push to main and pull requests
- Runs ruff linting and pytest with coverage
- Builds a Docker image and pushes to ECR
- Deploys to EKS staging on merge to main
- Uses OIDC for AWS authentication (no long-lived keys)
- Caches pip dependencies between runs"
Choosing the Right Tool¶
| Scenario | Best Tool | Why |
|---|---|---|
| Quick concept question | ChatGPT | Conversational, fast |
| Debugging with live commands | Claude Code | Has your terminal |
| Bulk refactoring across files | Codex | Async, autonomous |
| Writing code in editor | Copilot | Inline, zero friction |
| Architecture discussion | ChatGPT | Good at tradeoffs |
| Learning a new codebase | Claude Code | Reads your files |
| Writing a one-off script | Claude Code | Interactive iteration |
| Adding tests across repo | Codex | Batch autonomous work |
| Drafting a postmortem | ChatGPT | Strong at structured docs |
War story: An engineer used AI to generate a GitHub Actions workflow that included
terraform apply -auto-approveon merge to main — with no plan review step. A typo in a variable caused the workflow to destroy a production RDS instance. The fix: always generateterraform planas a separate step with human review beforeapply. AI-generated IaC pipelines should be treated as drafts that need security review, not production-ready configurations.Fun fact: GitHub Copilot was launched as a technical preview in June 2021, powered by OpenAI Codex. It was one of the first mass-market AI coding tools. By 2023, GitHub reported that Copilot was generating ~46% of code in files where it was active. The CLAUDE.md convention used by Claude Code was inspired by the
.github/copilot-instructions.mdpattern — giving AI tools project-specific context improves output quality dramatically.
Anti-Patterns to Avoid¶
- Vague prompts: "Make this better" - better how? Be specific.
- No context: Pasting code without explaining the stack, constraints, or goal.
- Trusting blindly: Always review AI-generated IaC, scripts, and configs. Check for:
- Hardcoded credentials or secrets
- Overly permissive IAM policies or security groups
- Missing error handling
- Resources without proper tagging
- One-shot for complex tasks: Break large requests into smaller, iterative prompts.
- Ignoring token limits: Provide relevant excerpts rather than entire files.
Security Considerations¶
- Never paste secrets, API keys, or credentials into any AI tool
- Sanitize logs before sharing (remove IPs, usernames, internal hostnames)
- Review IAM policies generated by AI - they tend to be overly permissive
- Check Terraform plans before applying AI-generated infrastructure code
- Understand data residency - relevant for SOC2, HIPAA compliance
- AI-generated IaC should always go through
terraform planreview, neverterraform apply -auto-approve
Wiki Navigation¶
Next Steps¶
- AI-Assisted DevOps Cookbook (Reference, L1)
Related Content¶
- AI-Assisted DevOps Cookbook (Reference, L1) — AI Tools for DevOps
- Generativeai Flashcards (CLI) (flashcard_deck, L1) — AI Tools for DevOps
- The Ops of AI/ML Workloads (Topic Pack, L2) — AI Tools for DevOps