Skip to content

Portal | Level: L1: Foundations | Topics: AI Tools for DevOps | Domain: DevOps & Tooling

AI Tools for DevOps - Primer

Why This Matters

AI tools are reshaping how DevOps engineers write code, debug infrastructure, and handle incidents. Knowing how to use ChatGPT, Claude Code, GitHub Copilot, and Codex effectively is becoming as fundamental as knowing kubectl or terraform. The engineers who learn to prompt well ship faster and debug harder problems.

The AI Tool Landscape for DevOps

ChatGPT (OpenAI)

Conversational AI accessed via browser or app. Best for: - Explaining concepts and troubleshooting errors - Drafting documentation, runbooks, and postmortems - Brainstorming architecture and comparing approaches - Learning new tools ("How does ArgoCD handle rollbacks?")

Key features for DevOps: - Custom instructions: Set persistent context about your stack - File uploads: Share config files, logs, error output for analysis - Custom GPTs: Build specialized assistants (IaC reviewer, incident commander) - Projects: Group related conversations with pinned reference files

ChatGPT Codex (OpenAI)

Agentic coding tool that works asynchronously in a sandboxed cloud environment: - Reads your repository, creates branches, writes code, runs tests - Generates pull requests with descriptions - Works in the background (queue tasks and come back) - Sandboxed - can't accidentally affect production

Best for: bulk refactoring, adding tests across a repo, repetitive multi-file changes.

Claude Code (Anthropic)

Terminal-native AI agent that runs in your shell: - Direct filesystem access - reads and edits your files - Runs commands with your permission (kubectl, terraform, docker, etc.) - Git integration - commits, pushes, creates PRs - Reads CLAUDE.md for project-specific context and conventions - Permission model - you approve or deny each action

Best for: interactive debugging, learning a codebase, one-off scripts, incident response.

GitHub Copilot

AI code completion integrated into your editor (VS Code, JetBrains, Neovim): - Inline suggestions as you type - Understands file context and project structure - Copilot Chat for conversational help within the editor - Works well for Terraform, YAML, Python, Dockerfiles, shell scripts

Best for: writing code faster, boilerplate generation, completing patterns.

Gotcha: AI-generated Terraform and IAM policies almost always err on the side of overly permissive access. A common pattern: you ask for "an IAM policy to access S3" and get s3:* on *. Always review generated IAM for least-privilege: replace wildcard actions with specific ones (s3:GetObject, s3:PutObject), and scope resources to specific ARNs rather than *.

Remember: The AI tool selection heuristic: "If it needs your files, use Claude Code. If it needs your editor, use Copilot. If it needs a conversation, use ChatGPT. If it needs to run autonomously across many files, use Codex." Match the tool to the interaction pattern, not the task category.

Prompt Engineering Fundamentals

Core Principles

1. Be Specific and Contextual

Bad:

"Write a Dockerfile"

Good:

"Write a multi-stage Dockerfile for a Python 3.11 FastAPI application. The production image should use python:3.11-slim, run as a non-root user (UID 1000), expose port 8000, and use uvicorn as the entrypoint."

2. Provide Examples (Few-Shot Prompting)

When you want output in a specific format, show the AI what you expect:

"Convert these environment variables to Kubernetes ConfigMap entries. Example: Input: DATABASE_URL=postgres://localhost:5432/mydb Output:

data:
  DATABASE_URL: postgres://localhost:5432/mydb
Now convert these: APP_PORT=8000, LOG_LEVEL=info, CACHE_TTL=300"

3. Use Role/Persona Framing

"You are a senior SRE with 10 years of experience in Kubernetes and AWS. Review this Helm chart and identify potential issues with resource limits, security contexts, and pod disruption budgets."

4. Chain of Thought

Ask the model to reason step-by-step for complex problems:

"I'm getting OOMKilled pods in my Kubernetes cluster. Walk me through a systematic debugging approach, step by step, including what commands to run and what to look for at each stage."

5. Constrain the Output

"Write a bash script that checks disk usage on /var/log. Requirements: - Alert if usage exceeds 80% - Output must be a single function - No external dependencies - Include error handling for missing directories - Keep it under 30 lines"

Advanced Techniques

System Prompts / Custom Instructions

Set persistent context so you don't repeat yourself every conversation:

"I'm a DevOps engineer working with:
- AWS (EKS, RDS, S3, IAM)
- Terraform 1.5+ for IaC
- Helm 3 for Kubernetes deployments
- Python 3.11 for tooling and services
- GitHub Actions for CI/CD

When generating IaC:
- Use variables for all configurable values
- Include tags on every resource
- Follow least-privilege for IAM policies"

Prompt Chaining (Multi-Step Workflows)

Break complex tasks into a sequence:

  1. "Read this Terraform state file and list all resources and dependencies."
  2. "Design a migration plan to split this into three state files: networking, compute, data."
  3. "Generate the terraform state mv commands for phase 1."
  4. "Write a validation script to verify resources after migration."

Negative Prompting

Specify what you don't want:

"Write a GitHub Actions workflow for Python CI. Do NOT:
- Use self-hosted runners
- Cache Docker layers
- Include deployment steps
- Use matrix builds"

Asking for Tradeoffs

"Compare these secrets management approaches for EKS:
1. AWS Secrets Manager + External Secrets Operator
2. HashiCorp Vault + CSI driver
3. Sealed Secrets

For each: setup complexity, operational overhead, cost at scale, failure modes."

DevOps-Specific Prompt Patterns

Infrastructure as Code Generation

"Generate a Terraform module for an AWS VPC with:
- CIDR block parameterized as a variable
- 3 public subnets and 3 private subnets across AZs
- NAT gateway in each public subnet
- Flow logs enabled to CloudWatch
- Tags: Environment and Project as variables
- Output the VPC ID, subnet IDs, and NAT gateway IDs
Use Terraform 1.5+ syntax with required_providers block."

Incident Debugging

"Here is the error from my application logs:
[paste error]

Context:
- Running on EKS 1.28
- Python 3.11 with FastAPI
- Using RDS PostgreSQL 15
- This started happening after deploying version 2.4.1

What are the most likely causes? List in order of probability
with a diagnostic command for each."

Code Review Assistance

"Review this Python function for:
1. Security vulnerabilities (OWASP Top 10)
2. Performance issues
3. Error handling gaps
4. Python best practices

Format as a numbered list with severity (high/medium/low) and suggested fix."

Pipeline/CI-CD Help

"Write a GitHub Actions workflow that:
- Triggers on push to main and pull requests
- Runs ruff linting and pytest with coverage
- Builds a Docker image and pushes to ECR
- Deploys to EKS staging on merge to main
- Uses OIDC for AWS authentication (no long-lived keys)
- Caches pip dependencies between runs"

Choosing the Right Tool

Scenario Best Tool Why
Quick concept question ChatGPT Conversational, fast
Debugging with live commands Claude Code Has your terminal
Bulk refactoring across files Codex Async, autonomous
Writing code in editor Copilot Inline, zero friction
Architecture discussion ChatGPT Good at tradeoffs
Learning a new codebase Claude Code Reads your files
Writing a one-off script Claude Code Interactive iteration
Adding tests across repo Codex Batch autonomous work
Drafting a postmortem ChatGPT Strong at structured docs

War story: An engineer used AI to generate a GitHub Actions workflow that included terraform apply -auto-approve on merge to main — with no plan review step. A typo in a variable caused the workflow to destroy a production RDS instance. The fix: always generate terraform plan as a separate step with human review before apply. AI-generated IaC pipelines should be treated as drafts that need security review, not production-ready configurations.

Fun fact: GitHub Copilot was launched as a technical preview in June 2021, powered by OpenAI Codex. It was one of the first mass-market AI coding tools. By 2023, GitHub reported that Copilot was generating ~46% of code in files where it was active. The CLAUDE.md convention used by Claude Code was inspired by the .github/copilot-instructions.md pattern — giving AI tools project-specific context improves output quality dramatically.

Anti-Patterns to Avoid

  1. Vague prompts: "Make this better" - better how? Be specific.
  2. No context: Pasting code without explaining the stack, constraints, or goal.
  3. Trusting blindly: Always review AI-generated IaC, scripts, and configs. Check for:
  4. Hardcoded credentials or secrets
  5. Overly permissive IAM policies or security groups
  6. Missing error handling
  7. Resources without proper tagging
  8. One-shot for complex tasks: Break large requests into smaller, iterative prompts.
  9. Ignoring token limits: Provide relevant excerpts rather than entire files.

Security Considerations

  1. Never paste secrets, API keys, or credentials into any AI tool
  2. Sanitize logs before sharing (remove IPs, usernames, internal hostnames)
  3. Review IAM policies generated by AI - they tend to be overly permissive
  4. Check Terraform plans before applying AI-generated infrastructure code
  5. Understand data residency - relevant for SOC2, HIPAA compliance
  6. AI-generated IaC should always go through terraform plan review, never terraform apply -auto-approve

Wiki Navigation

Next Steps