Tags Index¶
Browse all wiki content by tag. Click any tag to see all pages tagged with it.
L0¶
- Ansible: idempotence + modules vs plugins vs collections
- Ansible: inventory — hosts, groups, vars, targeting
- Ansible: playbook vs play vs task vs role vs handler
- Ansible: variable precedence
- Btrfs: subvolume, snapshot, reflink, CoW
- CI/CD as a System
- CSS Fundamentals
- CSS Fundamentals Footguns
- CSS Fundamentals — Street-Level Ops
- Career Engineering Footguns
- Career Engineering for Ops People - Street-Level Ops
- Container vs VM
- Corporate IT Fluency - Street-Level Ops
- Corporate IT Fluency Footguns
- DNS: Stub Resolver vs Recursive Resolver vs Authoritative Server
- Deployment vs ReplicaSet vs Pod
- DevOps Learning Roadmap
- File vs inode vs pathname vs symlink
- Git Drills
- Git Footguns
- Git for DevOps Engineers - Street Ops
- Git: commit vs branch vs tag vs HEAD
- Git: rebase vs merge
- Git: working tree vs index vs repository
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- Homelab & Learning Infrastructure - Street-Level Ops
- Homelab Footguns
- Image vs Container
- K8s Concept Chain — Footguns
- K8s Concept Chain — Street-Level Ops
- Kubernetes Concept Chain
- Kubernetes Control Plane as Reconciliation Engine
- Kubernetes Ecosystem - Street-Level Ops
- Kubernetes Ecosystem Footguns
- Linux Deep Triage
- Linux Ops Drills
- Linux Ops Footguns
- Linux System Administration - Street Ops
- Linux: kernel vs userspace vs distro
- Logs vs Metrics vs Traces
- Mental-Model-First Learning Guide
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Permissions: mode bits vs ownership vs ACLs vs capabilities
- Persistent Volume vs Persistent Volume Claim
- Pod vs Container (Kubernetes)
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process vs program vs service
- Python Drills
- RAID vs Backup vs Snapshot
- Reverse Proxy vs Load Balancer
- SQL Fundamentals Footguns
- SQL Fundamentals — Street-Level Ops
- Service vs Ingress (Kubernetes Networking)
- Skillcheck: Bash
- Skillcheck: Docker
- Skillcheck: Git
- Skillcheck: Linux Fundamentals
- Skillcheck: Modern CLI Tools
- Skillcheck: Python Automation
- Storage Stack: Disk, Partition, LVM, Filesystem, Mount
- Systemd Units: Unit, Service, Target, Start vs Enable
- Terraform: Desired State Engine
- Track: Containers
- Track: Foundations
- Trivia compendium
- VS Code Footguns
- VS Code for DevOps - Street Ops
L1¶
- AI Tools for DevOps - Footguns
- AI Tools for DevOps - Street Ops
- AI-Assisted DevOps Cookbook
- AWS EC2
- AWS EC2 - Street-Level Ops
- AWS EC2 Footguns
- AWS IAM
- AWS IAM - Street-Level Ops
- AWS IAM Footguns
- AWS Networking
- AWS Networking - Street-Level Ops
- AWS Networking Footguns
- AWS S3 Deep Dive
- Advanced Bash Footguns
- Advanced Bash for Ops - Street-Level Ops
- Ansible Footguns
- Ansible for Infrastructure Automation - Street Ops
- Binary and Floating Point Footguns
- Binary and Floating Point — Street-Level Ops
- Binary and Floats
- CI Pipeline Documentation
- CI/CD Drills
- CI/CD Footguns
- CI/CD Pipelines - Street Ops
- Case Study: ARP Flux Duplicate IP
- Case Study: BGP Peer Flapping
- Case Study: BIOS Settings Reset After CMOS
- Case Study: BMC Clock Skew Cert Failure
- Case Study: Bonding Failover Not Working
- Case Study: Cable Management Wrong Port
- Case Study: CrashLoopBackOff No Logs
- Case Study: DHCP Relay Broken
- Case Study: DNS Resolution Slow
- Case Study: DNS Split Horizon Confusion
- Case Study: DaemonSet Blocks Eviction
- Case Study: Disk Full Root Services Down
- Case Study: Duplex Mismatch Symptoms
- Case Study: Firewall Shadow Rule
- Case Study: Firmware Update Boot Loop
- Case Study: HBA Firmware Mismatch
- Case Study: ImagePullBackOff Registry Auth
- Case Study: Inode Exhaustion
- Case Study: Jumbo Frames Partial
- Case Study: LACP Mismatch One Link Hot
- Case Study: Link Flaps Bad Optic
- Case Study: Memory ECC Errors Increasing
- Case Study: Multicast Not Crossing Router
- Case Study: NVMe Drive Disappeared
- Case Study: OS Install Fails RAID Controller
- Case Study: OSPF Stuck In Exstart
- Case Study: PXE Boot Fails UEFI Mismatch
- Case Study: Persistent Volume Stuck Terminating
- Case Study: Power Supply Redundancy Lost
- Case Study: Proxy ARP Causing Issues
- Case Study: Rack PDU Overload Alert
- Case Study: Resource Quota Blocking Deploy
- Case Study: SELinux Denying Service
- Case Study: SSL Cert Chain Incomplete
- Case Study: Serial Console Garbled
- Case Study: Server Intermittent Reboot
- Case Study: Server Remote Console Lag
- Case Study: Service No Endpoints
- Case Study: Source Routing Policy Miss
- Case Study: Systemd Service Flapping
- Case Study: TCP RST After Idle
- Case Study: Thermal Throttle Fan Failure
- Case Study: VLAN Trunk Mistag
- Case Study: iDRAC Unreachable OS Up
- Change Management - Street-Level Ops
- Change Management Footguns
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Cloud Operations Basics - Street Ops
- Cloud Ops Drills
- Cloud Ops Footguns
- Container Base Images — Footguns & Pitfalls
- Container Base Images — Street Ops
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- DHCP & IP Address Management - Street-Level Ops
- DHCP & IP Address Management Footguns
- DNF Package Manager
- DNS Deep Dive - Footguns
- DNS Deep Dive - Street-Level Ops
- DORA Metrics & DevEx Footguns
- DORA Metrics & DevEx — Street-Level Ops
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Drills
- Datacenter Footguns
- Debian & Ubuntu — Footguns & Pitfalls
- Debian & Ubuntu — Street Ops
- Debugging Methodology - Street-Level Ops
- Debugging Methodology Footguns
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Disk & Storage Ops
- Docker Drills
- Drills
- Feature Flags Footguns
- Feature Flags — Street-Level Ops
- Git Workflows & Branching Strategies
- GitHub Actions - Street-Level Ops
- GitHub Actions Footguns
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: Memory ECC Errors Increasing
- Grading Checklist: PXE Boot Fails - UEFI Mismatch
- Grading Checklist: TLS Works From Some Clients But Fails From Others
- Grading Checklist: Thermal Throttle - Fan Failure
- Helm Drills
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Infrastructure as Code with Terraform - Street Ops
- Inodes
- Interview: Docker Container Debugging
- Interview: Linux Server Slow
- Kubernetes Pods & Scheduling - Street Ops
- Kubernetes Pods & Scheduling Footguns
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- Kustomize - Street-Level Ops
- Kustomize Footguns
- Legacy System Archaeology - Street-Level Ops
- Legacy System Archaeology Footguns
- Linux Boot Process
- Linux Boot Process — Footguns & Pitfalls
- Linux Boot Process — Street Ops
- Linux Data Hoarding
- Linux Distribution Comparison — Footguns & Pitfalls
- Linux Distribution Comparison — Street Ops
- Linux Logging
- Linux Logging — Footguns
- Linux Logging — Street Ops
- Linux Memory Management
- Linux Memory Management — Footguns
- Linux Memory Management — Street Ops
- Linux Ops Storage
- Linux Ops Systemd
- Linux Signals & Process Control - Footguns
- Linux Signals & Process Control - Street-Level Ops
- Linux Text Processing
- Linux Text Processing - Street-Level Ops
- Linux Text Processing Footguns
- Linux Text Processing — Trivia & History
- Linux Users & Permissions
- Linux Users and Permissions — Footguns & Pitfalls
- Linux Users and Permissions — Street Ops
- Load Testing Footguns
- Load Testing — Street-Level Ops
- Modern Cli Workflows
- MongoDB Operations Footguns
- MongoDB Operations — Street-Level Ops
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- MySQL / MariaDB Operations Footguns
- MySQL / MariaDB Operations — Street-Level Ops
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Drills
- Networking Footguns
- Nginx & Web Servers - Street-Level Ops
- Nginx & Web Servers Footguns
- Ops-Focused Security Basics - Street Ops
- Pipes & Redirection - Footguns
- Pipes & Redirection - Street-Level Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process Management - Street-Level Ops
- Process Management Footguns
- Python Debugging
- Python Debugging Footguns
- Python Debugging — Street-Level Ops
- Python for Infrastructure - Street-Level Ops
- Python for Infrastructure Footguns
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: Memory ECC Errors Increasing
- Questions: PXE Boot Fails - UEFI Mismatch
- Questions: TLS Works From Some Clients But Fails From Others
- Questions: Thermal Throttle - Fan Failure
- Redfish -- Footguns
- Redfish -- Street Ops
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- Runbook: Helm Upgrade Failed
- Runbook: ImagePullBackOff
- Runbook: Ingress 404
- Runbook: Readiness Probe Failed
- S3-Compatible Object Storage Footguns
- S3-Compatible Object Storage — Street-Level Ops
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- Scenario: Duplex Mismatch
- Scenario: RAID Array Degraded
- Scenario: Server Won't Boot After Update
- Scenario: Thermal Throttling
- Security Footguns
- Skillcheck
- Skillcheck: CI/CD
- Skillcheck: Cloud Basics
- Skillcheck: Cloud Providers
- Skillcheck: Datacenter
- Skillcheck: DevOps Roadmap (Expanded)
- Skillcheck: Helm & Release Ops
- Skillcheck: Kubernetes
- Skillcheck: Networking Fundamentals
- Skillcheck: Terraform / IaC
- Solution
- Solution
- Solution
- Solution
- Solution: Bonding Failover Not Working
- Solution: Memory ECC Errors Increasing
- Solution: PXE Boot Fails - UEFI Mismatch
- Solution: TLS Works From Some Clients But Fails From Others
- Solution: Thermal Throttle - Fan Failure
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms: Memory ECC Errors Increasing
- Symptoms: Network Bonding Failover Not Working
- Symptoms: PXE Boot Fails - UEFI Mismatch
- Symptoms: TLS Works From Some Clients But Fails From Others
- Symptoms: Thermal Throttle - Fan Failure
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
- Systems Thinking Footguns
- Systems Thinking for Engineers - Street-Level Ops
- TLS & Certificates Ops - Street-Level Ops
- TLS & Certificates Ops Footguns
- Terminal Internals
- Terminal Internals - Street Ops
- Terminal Internals Footguns
- Terraform Drills
- Terraform Footguns
- Track: Helm & Release Ops
- Track: Infrastructure
- Track: Kubernetes Core
- Trivia compendium
- Trivia compendium
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
- YAML, JSON & Config Formats - Footguns
- YAML, JSON & Config Formats - Street Ops
- awk — Footguns
- awk — Street-Level Ops
- awk: The Record/Field Processor
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
- find - Footguns & Pitfalls
- find - Street-Level Ops
- grep & Regular Expressions
- grep & Regular Expressions - Footguns
- grep & Regular Expressions - Street-Level Ops
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
- kubectl Debugging Cheatsheet
- kubectl Drills
- rsync - Street Ops
- rsync Footguns
- sed — Footguns
- sed — Street-Level Ops
- sed: The Stream Editor
- strace Footguns
- strace — Street-Level Ops
- systemctl & journalctl Footguns
- systemctl & journalctl Street Ops
- tar & Compression - Footguns
- tar & Compression - Street-Level Ops
- tmux & screen
- xargs - Footguns & Pitfalls
- xargs - Street Ops
L2¶
- AI/ML Ops Footguns
- API Gateways & Ingress - Street-Level Ops
- API Gateways & Ingress Footguns
- AWS Lambda
- AWS Lambda - Street-Level Ops
- AWS Lambda Footguns
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- Alerting Rules Drills
- Alerting Rules Footguns
- Ansible Deep Dive - Footguns
- Ansible Deep Dive - Street Ops
- Argo Workflows Footguns
- Argo Workflows — Street-Level Ops
- ArgoCD & GitOps Footguns
- ArgoCD & GitOps — Street-Level Ops
- Backstage - Street-Level Ops
- Backstage Footguns
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Capacity Planning - Street-Level Ops
- Capacity Planning Footguns
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL
- Case Study: Alert Storm — Flapping Health Checks
- Case Study: Ansible Playbook Hangs — SSH Agent Forwarding Blocked by Firewall
- Case Study: Asymmetric Routing One Direction
- Case Study: Backup Job Failing — iSCSI Target Unreachable, VLAN Misconfigured
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption
- Case Study: CNI Broken After Restart
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured
- Case Study: Container Vuln Scanner False Positive Blocks Deploy
- Case Study: CoreDNS Timeout Pod DNS
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager
- Case Study: Database Replication Lag — Root Cause Is RAID Degradation
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention
- Case Study: Drain Blocked by PDB
- Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP
- Case Study: IPTables Blocking Unexpected
- Case Study: Job Queue Backlog — Worker Pod CPU Throttled by cgroup
- Case Study: Kernel Soft Lockup
- Case Study: MTU Blackhole TLS Stalls
- Case Study: NAT Exhaustion Intermittent
- Case Study: Network Loop Broadcast Storm
- Case Study: Node NotReady — NIC Firmware Bug, Fix Is Ansible Playbook
- Case Study: Node Pressure Evictions
- Case Study: OOM Killer Events
- Case Study: Pod OOMKilled — Memory Leak in Sidecar, Fix Is Helm Values
- Case Study: RAID Degraded Rebuild Latency
- Case Study: Runaway Logs Fill Disk
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable
- Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy
- Case Study: Stuck NFS Mount
- Case Study: Terraform Apply Fails — State Lock Stuck, DynamoDB Throttle
- Case Study: Time Sync Skew Breaks App
- Case Study: User Auth Failing — OIDC Cert Expired, Cloud KMS Rotation
- Case Study: Zombie Processes Accumulating
- Ceph Storage Footguns
- Ceph Storage — Street-Level Ops
- Chaos Engineering & Fault Injection - Street-Level Ops
- Chaos Engineering Footguns
- Cilium & eBPF Networking - Street-Level Ops
- Cilium & eBPF Networking Footguns
- Cloud Deep Dive Drills
- Cloud Deep-Dive Footguns
- Cloud Provider Deep-Dive - Street-Level Ops
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Container Runtime Drills
- Continuous Profiling Footguns
- Continuous Profiling — Street-Level Ops
- Cost Optimization & FinOps - Street-Level Ops
- Crossplane - Street-Level Ops
- Crossplane Footguns
- DNS Operations - Street-Level Ops
- DNS Operations Footguns
- Dagger - Street-Level Ops
- Dagger Footguns
- Database Operations - Street-Level Ops
- Database Ops Drills
- Database Ops Footguns
- Deep Dive: CI/CD Pipeline Architecture
- Deep Dive: Containers How They Really Work
- Deep Dive: Docker Image Internals
- Deep Dive: Kubernetes Networking
- Deep Dive: Kubernetes Pod Lifecycle
- Deep Dive: Linux Boot Sequence
- Deep Dive: Linux Memory Management
- Deep Dive: Linux Network Packet Flow
- Deep Dive: Linux Performance Debugging
- Deep Dive: Systemd Architecture
- Deep Dive: Terraform State Internals
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Disaster Recovery & Backup Engineering - Street-Level Ops
- Disaster Recovery Footguns
- Distributed Systems Footguns
- Distributed Systems Fundamentals — Street-Level Ops
- Edge & IoT Infrastructure - Street-Level Ops
- Edge & IoT Infrastructure Footguns
- FinOps Drills
- FinOps Footguns
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Footguns
- Footguns
- Footguns
- Footguns
- Footguns
- GitOps & ArgoCD Drills
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: Network Experiencing Broadcast Storm and High CPU on Switches
- Grading Checklist: RAID Degraded Rebuild Latency
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- HAProxy & Nginx Load Balancing Footguns
- HAProxy & Nginx for Ops - Street-Level Ops
- HashiCorp Vault - Street-Level Ops
- HashiCorp Vault Footguns
- Incident Command & On-Call - Street-Level Ops
- Incident Command & On-Call Footguns
- Incident Postmortem & SLO/SLI - Street-Level Ops
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- Infrastructure Testing Footguns
- Infrastructure Testing — Street-Level Ops
- Interview: CI Vuln Scan Failed
- Interview: Certificate Expired
- Interview: Config Drift Detected
- Interview: Cost Spike Investigation
- Interview: Deployment Stuck Progressing
- Interview: GitOps Drift Detected
- Interview: HPA Not Scaling
- Interview: Helm Upgrade Broke Prod
- Interview: Ingress 404
- Interview: Kyverno Blocking Deploys
- Interview: Loki Logs Disappeared
- Interview: Pods OOMKilled
- Interview: Prometheus Target Down
- Interview: RBAC Forbidden
- Interview: Secret Leaked to Git
- Interview: Server Won't POST
- Interview: Vault Token Expired
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- Kubernetes Debugging -- Street Ops
- Kubernetes Debugging Footguns
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Node Lifecycle -- Street Ops
- Kubernetes Node Lifecycle Footguns
- Kubernetes Ops Footguns
- LDAP & Identity Management - Street-Level Ops
- LDAP & Identity Management Footguns
- LPIC / LFCS — Footguns & Pitfalls
- LPIC / LFCS — Street Ops
- Linux Kernel Tuning - Street-Level Ops
- Linux Kernel Tuning Footguns
- Linux Performance Tuning - Street-Level Ops
- Linux Performance Tuning Footguns
- Log Analysis & Alerting Rules - Street-Level Ops
- Log Pipelines - Street-Level Ops
- Log Pipelines Footguns
- LogQL Drills
- Mellanox Switches
- Modern CLI Workflows Footguns
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Multi-Cluster & Federation - Exercises & Reference
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Nix / NixOS - Street-Level Ops
- Nix / NixOS Footguns
- Observability Deep Dive - Street Ops
- Observability Drills
- Observability Footguns
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- OpenTofu - Street-Level Ops
- OpenTofu Footguns
- Ops Archaeology: The 5% That Can't Resolve
- Ops Archaeology: The Alerts That Stopped Firing
- Ops Archaeology: The Certificate That Works Sometimes
- Ops Archaeology: The Cluster That Disagrees With Itself
- Ops Archaeology: The Container That Exits Immediately
- Ops Archaeology: The DR That Looks Ready But Isn't
- Ops Archaeology: The Deploy That Didn't Deploy
- Ops Archaeology: The Gateway That Returns 502
- Ops Archaeology: The Job That Succeeded Wrong
- Ops Archaeology: The Pods That Won't Schedule
- Ops Archaeology: The Replica That Fell Behind
- Ops Archaeology: The Requests That Vanish
- Ops Archaeology: The Service That Won't Start
- Ops Archaeology: The Session Store That Keeps Dying
- Ops Archaeology: The Slow Death Nobody Noticed
- Ops War Stories & Pattern Recognition - Street-Level Ops
- Ops War Stories & Pattern Recognition Footguns
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Policy Engine Drills
- Policy Engine Footguns
- Policy Engines - Street-Level Ops
- PostgreSQL Footguns
- PostgreSQL Operations - Street-Level Ops
- Postmortem & SLO Drills
- Postmortem & SLO Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Progressive Delivery Footguns
- Progressive Delivery — Street-Level Ops
- PromQL Drills
- Pulumi - Street-Level Ops
- Pulumi Footguns
- Python Async & Concurrency
- Python Packaging
- Python Packaging Footguns
- Python Packaging — Street-Level Ops
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: Network Experiencing Broadcast Storm and High CPU on Switches
- Questions: RAID Degraded Rebuild Latency
- RHCE (EX294) — Footguns & Pitfalls
- RHCE (EX294) — Street Ops
- RabbitMQ Footguns
- RabbitMQ Operations - Street-Level Ops
- Redis Footguns
- Redis Operations - Street-Level Ops
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Runbook: ArgoCD Out of Sync
- Runbook: Certificate Renewal Failed
- Runbook: Disaster Recovery
- Runbook: HPA Not Scaling
- Runbook: Kyverno Blocking Workloads
- Runbook: Loki No Logs
- Runbook: NetworkPolicy Block
- Runbook: Pod Eviction
- Runbook: RBAC Forbidden
- Runbook: Secret Rotation
- Runbook: Tempo No Traces
- Runbook: VPC IP Exhaustion
- Runbook: Velero Backup & Restore
- Runbook: etcd Backup & Restore
- SELinux & AppArmor - Street-Level Ops
- SELinux & AppArmor Footguns
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
- SLO Tooling Footguns
- SLO Tooling — Street-Level Ops
- SQLite Footguns
- SQLite Operations & Internals - Street-Level Ops
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
- Scenario: Asymmetric Routing
- Scenario: DNS Looks Fine but App Fails
- Scenario: MTU Blackhole
- Scenario: NIC Flapping / LACP Mismatch
- Scenario: OOB Unreachable but Host Responds
- Scenario: VLAN Trunk Mismatch
- Secrets Management - Street-Level Ops
- Secrets Management Drills
- Secrets Management Footguns
- Security Drills
- Skillcheck: Alerting Rules
- Skillcheck: Container Runtime Debug
- Skillcheck: Database Ops
- Skillcheck: FinOps
- Skillcheck: GitOps
- Skillcheck: Kubernetes Under the Covers
- Skillcheck: Observability
- Skillcheck: Policy Engines
- Skillcheck: Postmortems & SLOs
- Skillcheck: Secrets Management
- Skillcheck: Security (Expanded)
- Skillcheck: TLS & PKI
- Skillcheck: etcd
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution: Asymmetric Routing / One-Direction Failure
- Solution: MTU Black Hole / TLS Stalls
- Solution: NAT Port Exhaustion / Intermittent Failures
- Solution: Network Experiencing Broadcast Storm and High CPU on Switches
- Solution: RAID Degraded Rebuild Latency
- Storage Operations - Street-Level Ops
- Storage Operations Footguns
- Street ops
- Street ops
- Street ops
- Street ops
- Street ops
- Supply Chain Security - Street-Level Ops
- Supply Chain Security Footguns
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Asymmetric Routing / One-Direction Failure
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: MTU Black Hole / TLS Stalls
- Symptoms: NAT Port Exhaustion / Intermittent Failures
- Symptoms: Network Experiencing Broadcast Storm and High CPU on Switches
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: RAID Degraded Rebuild Latency
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- TCP/IP Deep Dive - Street-Level Ops
- TCP/IP Deep Dive Footguns
- TLS & PKI Drills
- Tailscale - Street-Level Ops
- Tailscale Footguns
- Terraform Deep Dive - Footguns
- Terraform Deep Dive - Street Ops
- The Ops of AI/ML Workloads - Street-Level Ops
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
- Track: Incident Response
- Track: Observability
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
- Virtualization - Street-Level Ops
- Virtualization Footguns
- Wireshark / tshark / tcpdump - Street-Level Ops
- Wireshark / tshark / tcpdump Footguns
- cgroups & Linux Namespaces - Street Ops
- cgroups & Namespaces Footguns
- etcd Drills
- gRPC - Street-Level Ops
- gRPC Footguns
- perf Profiling
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
L3¶
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Interview: Database Failover During Deploy
- Interview: Service Mesh 503s
- Interview: etcd Space Exceeded
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Kernel Troubleshooting - Street-Level Ops
- Kernel Troubleshooting Footguns
- Kubernetes Operators Drills
- Primer
- Primer
- Primer
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Runbook: Istio 503 Errors
- Scenario: etcd Troubleshooting
- Service Mesh - Street-Level Ops
- Service Mesh Drills
- Service Mesh Footguns
- Skillcheck: Kubernetes Operators
- Skillcheck: Service Mesh
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- WebAssembly for Infrastructure - Street-Level Ops
- WebAssembly for Infrastructure Footguns
- eBPF & Modern Linux Observability - Street-Level Ops
- eBPF & Modern Linux Observability Footguns
acl¶
- Diagnostic Questions
- Grading Rubric
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
acme¶
ai¶
- AI DevOps Tools — Trivia & Interesting Facts
- AI/ML Ops — Trivia & Interesting Facts
- Claude Code — Trivia & Interesting Facts
ai-devops-tools¶
- AI Tools for DevOps - Footguns
- AI Tools for DevOps - Street Ops
- AI-Assisted DevOps Cookbook
- AI/ML Ops Footguns
- Primer
- Primer
- The Ops of AI/ML Workloads - Street-Level Ops
ai_devops_tools¶
ai_ml_ops¶
alerting¶
- Alerting Rules
- Alerting Rules Drills
- Alerting Rules Footguns
- Anti-Primer: Alerting Rules
- Log Analysis & Alerting Rules - Street-Level Ops
- On-Call
- Primer
- Primer
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
- Skillcheck: Alerting Rules
- Thinking Out Loud: Alerting Rules
alertmanager¶
- Primer
- Primer
- SLO Tooling Footguns
- SLO Tooling — Street-Level Ops
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
alerts¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
ansible¶
- Ansible Deep Dive - Footguns
- Ansible Deep Dive - Street Ops
- Ansible Footguns
- Ansible for Infrastructure Automation - Street Ops
- Anti-Primer: Ansible
- Diagnostic Questions
- Diagnostic Questions
- Drills
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Grading Rubric
- Grading Rubric
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Primer
- Primer
- Primer
- Primer
- RHCE (EX294) — Footguns & Pitfalls
- RHCE (EX294) — Street Ops
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Skillcheck
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Thinking Out Loud: Ansible
- Topics
- Track: Infrastructure
- Trivia compendium
ansible_deep_dive¶
api-design¶
api-gateway¶
api_gateway¶
architecture¶
- Architecture & Design Models
- Decision Tree: Managed vs Self-Hosted Service
- Decision Tree: Monolith vs Microservices
- Decision Tree: Sync vs Async Communication
- Decision Tree: Where Should This Run?
- Decision Tree: Which Database for This Workload?
- Kubernetes Concept Chain
- Primer
argo-rollouts¶
argo-workflows¶
argo_workflows¶
argocd¶
argocd_gitops¶
- Anti-Primer: Argocd Gitops
- Anti-Primer: Gitops
- ArgoCD & GitOps
- ArgoCD & GitOps - Street-Level Ops
- GitOps
- GitOps Footguns
- Thinking Out Loud: ArgoCD & GitOps
arp¶
assessment¶
- Production Readiness Assessment
- Skillcheck
- Skillcheck: Alerting Rules
- Skillcheck: Bash
- Skillcheck: CI/CD
- Skillcheck: Cloud Basics
- Skillcheck: Cloud Providers
- Skillcheck: Container Runtime Debug
- Skillcheck: Database Ops
- Skillcheck: Datacenter
- Skillcheck: DevOps Roadmap (Expanded)
- Skillcheck: Docker
- Skillcheck: FinOps
- Skillcheck: Git
- Skillcheck: GitOps
- Skillcheck: Helm & Release Ops
- Skillcheck: Kubernetes
- Skillcheck: Kubernetes Operators
- Skillcheck: Kubernetes Under the Covers
- Skillcheck: Linux Fundamentals
- Skillcheck: Modern CLI Tools
- Skillcheck: Networking Fundamentals
- Skillcheck: Observability
- Skillcheck: Policy Engines
- Skillcheck: Postmortems & SLOs
- Skillcheck: Python Automation
- Skillcheck: Secrets Management
- Skillcheck: Security (Expanded)
- Skillcheck: Service Mesh
- Skillcheck: TLS & PKI
- Skillcheck: Terraform / IaC
- Skillcheck: etcd
async¶
audit-logging¶
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- Primer
- Primer
- Primer
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
audit_logging¶
auth¶
- Diagnostic Questions
- Grading Rubric
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
automation¶
aws¶
- AWS EC2
- AWS EC2 - Street-Level Ops
- AWS EC2 Footguns
- AWS IAM
- AWS IAM - Street-Level Ops
- AWS IAM Footguns
- AWS Lambda
- AWS Lambda - Street-Level Ops
- AWS Lambda Footguns
- AWS Networking
- AWS Networking - Street-Level Ops
- AWS Networking Footguns
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- AWS S3 Deep Dive
- AWS Troubleshooting — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- Primer
- Primer
aws_cloudwatch¶
- AWS CloudWatch
- AWS CloudWatch - Street-Level Ops
- AWS CloudWatch Footguns
- Anti-Primer: AWS Cloudwatch
aws_ec2¶
aws_ecs¶
aws_iam¶
aws_lambda¶
aws_networking¶
aws_route53¶
aws_s3_deep_dive¶
aws_troubleshooting¶
- AWS Troubleshooting
- AWS Troubleshooting - Street-Level Ops
- AWS Troubleshooting Footguns
- Anti-Primer: AWS Troubleshooting
azure¶
azure_troubleshooting¶
- Anti-Primer: Azure Troubleshooting
- Azure Troubleshooting
- Azure Troubleshooting - Street-Level Ops
- Azure Troubleshooting Footguns
backstage¶
- Anti-Primer: Backstage
- Backstage & Developer Portals
- Backstage - Street-Level Ops
- Backstage Footguns
backup¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Linux Data Hoarding
- Primer
- Primer
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- rsync - Street Ops
- rsync Footguns
- tar & Compression - Footguns
- tar & Compression - Street-Level Ops
backup-restore¶
- Backup & Restore — Trivia & Interesting Facts
- Disaster Recovery & Backup Engineering - Street-Level Ops
- Disaster Recovery Footguns
- Primer
backup_restore¶
- Anti-Primer: Backup Restore
- Backup & Restore - Street-Level Ops
- Backup & Restore Footguns
- Backup Restore
bash¶
bash-scripting¶
- Advanced Bash Footguns
- Advanced Bash for Ops - Street-Level Ops
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- Deep Dive: Linux Performance Debugging
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Linux Deep Triage
- Linux Ops Drills
- Linux Ops Footguns
- Linux System Administration - Street Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process Management - Street-Level Ops
- Process Management Footguns
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- Skillcheck: Bash
- Terminal Internals - Street Ops
- Terminal Internals Footguns
- Track: Foundations
bash_scripting¶
bgp¶
- Diagnostic Questions
- Grading Rubric
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
bgp-evpn¶
bgp_evpn_vxlan¶
binary¶
- Binary and Floating Point Footguns
- Binary and Floating Point — Street-Level Ops
- Binary and Floats
- Primer
binary_and_floats¶
blackbox-exporter¶
bmc¶
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Primer
- Primer
- Redfish -- Footguns
- Redfish -- Street Ops
boot-process¶
branching¶
build-systems¶
canary¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Primer
- Progressive Delivery Footguns
- Progressive Delivery — Street-Level Ops
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
cap-theorem¶
capacity-planning¶
capacity_planning¶
career¶
- Anti-Primer: Career Engineering
- Career Engineering Footguns
- Career Engineering for Ops People
- Career Engineering for Ops People - Street-Level Ops
- Corporate IT Fluency - Street-Level Ops
- Corporate IT Fluency Footguns
- Primer
- Primer
case-study¶
- Answer Key: The 5% That Can't Resolve
- Answer Key: The Alerts That Stopped Firing
- Answer Key: The Certificate That Works Sometimes
- Answer Key: The Cluster That Disagrees With Itself
- Answer Key: The Container That Exits Immediately
- Answer Key: The DR That Looks Ready But Isn't
- Answer Key: The Deploy That Didn't Deploy
- Answer Key: The Gateway That Returns 502
- Answer Key: The Job That Succeeded Wrong
- Answer Key: The Pods That Won't Schedule
- Answer Key: The Replica That Fell Behind
- Answer Key: The Requests That Vanish
- Answer Key: The Service That Won't Start
- Answer Key: The Session Store That Keeps Dying
- Answer Key: The Slow Death Nobody Noticed
- Case Studies
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL
- Case Study: ARP Flux Duplicate IP
- Case Study: Alert Storm — Flapping Health Checks
- Case Study: Ansible Playbook Hangs — SSH Agent Forwarding Blocked by Firewall
- Case Study: Asymmetric Routing One Direction
- Case Study: BGP Peer Flapping
- Case Study: BIOS Settings Reset After CMOS
- Case Study: BMC Clock Skew Cert Failure
- Case Study: Backup Job Failing — iSCSI Target Unreachable, VLAN Misconfigured
- Case Study: Bonding Failover Not Working
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption
- Case Study: CNI Broken After Restart
- Case Study: Cable Management Wrong Port
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured
- Case Study: Container Vuln Scanner False Positive Blocks Deploy
- Case Study: CoreDNS Timeout Pod DNS
- Case Study: CrashLoopBackOff No Logs
- Case Study: DHCP Relay Broken
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager
- Case Study: DNS Resolution Slow
- Case Study: DNS Split Horizon Confusion
- Case Study: DaemonSet Blocks Eviction
- Case Study: Database Replication Lag — Root Cause Is RAID Degradation
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation
- Case Study: Disk Full Root Services Down
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention
- Case Study: Drain Blocked by PDB
- Case Study: Duplex Mismatch Symptoms
- Case Study: Firewall Shadow Rule
- Case Study: Firmware Update Boot Loop
- Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy
- Case Study: HBA Firmware Mismatch
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP
- Case Study: IPTables Blocking Unexpected
- Case Study: ImagePullBackOff Registry Auth
- Case Study: Inode Exhaustion
- Case Study: Job Queue Backlog — Worker Pod CPU Throttled by cgroup
- Case Study: Jumbo Frames Partial
- Case Study: Kernel Soft Lockup
- Case Study: LACP Mismatch One Link Hot
- Case Study: Link Flaps Bad Optic
- Case Study: MTU Blackhole TLS Stalls
- Case Study: Memory ECC Errors Increasing
- Case Study: Multicast Not Crossing Router
- Case Study: NAT Exhaustion Intermittent
- Case Study: NVMe Drive Disappeared
- Case Study: Network Loop Broadcast Storm
- Case Study: Node NotReady — NIC Firmware Bug, Fix Is Ansible Playbook
- Case Study: Node Pressure Evictions
- Case Study: OOM Killer Events
- Case Study: OS Install Fails RAID Controller
- Case Study: OSPF Stuck In Exstart
- Case Study: PXE Boot Fails UEFI Mismatch
- Case Study: Persistent Volume Stuck Terminating
- Case Study: Pod OOMKilled — Memory Leak in Sidecar, Fix Is Helm Values
- Case Study: Power Supply Redundancy Lost
- Case Study: Proxy ARP Causing Issues
- Case Study: RAID Degraded Rebuild Latency
- Case Study: Rack PDU Overload Alert
- Case Study: Resource Quota Blocking Deploy
- Case Study: Runaway Logs Fill Disk
- Case Study: SELinux Denying Service
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable
- Case Study: SSL Cert Chain Incomplete
- Case Study: Serial Console Garbled
- Case Study: Server Intermittent Reboot
- Case Study: Server Remote Console Lag
- Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy
- Case Study: Service No Endpoints
- Case Study: Source Routing Policy Miss
- Case Study: Stuck NFS Mount
- Case Study: Systemd Service Flapping
- Case Study: TCP RST After Idle
- Case Study: Terraform Apply Fails — State Lock Stuck, DynamoDB Throttle
- Case Study: Thermal Throttle Fan Failure
- Case Study: Time Sync Skew Breaks App
- Case Study: User Auth Failing — OIDC Cert Expired, Cloud KMS Rotation
- Case Study: VLAN Trunk Mistag
- Case Study: Zombie Processes Accumulating
- Case Study: iDRAC Unreachable OS Up
- Cross-Domain Incident Case Studies
- Datacenter Operations Case Studies
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: BMC Clock Skew - Certificate Failure
- Grading Checklist: DHCP Not Working on Remote VLAN
- Grading Checklist: DNS Resolution Taking 5+ Seconds Intermittently
- Grading Checklist: Disk Full Root - Services Down
- Grading Checklist: Firmware Update Boot Loop
- Grading Checklist: Jumbo Frames Enabled But Some Paths Failing
- Grading Checklist: Link Flaps - Bad Optic
- Grading Checklist: Memory ECC Errors Increasing
- Grading Checklist: Multicast Traffic Not Crossing Router
- Grading Checklist: Network Experiencing Broadcast Storm and High CPU on Switches
- Grading Checklist: OSPF Adjacency Stuck in ExStart/Exchange State
- Grading Checklist: PXE Boot Fails - UEFI Mismatch
- Grading Checklist: Power Supply Redundancy Lost
- Grading Checklist: Proxy ARP Causing Unexpected Routing Behavior
- Grading Checklist: RAID Degraded Rebuild Latency
- Grading Checklist: TCP Connections Reset After Idle Period
- Grading Checklist: TLS Works From Some Clients But Fails From Others
- Grading Checklist: Thermal Throttle - Fan Failure
- Grading Checklist: Traffic From Specific Source Not Taking Expected Path
- Grading Checklist: iDRAC Unreachable, OS Up
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Incident Replay: ARP Flux — Duplicate IP Detection
- Incident Replay: Asymmetric Routing — Traffic Works One Direction Only
- Incident Replay: BGP Peer Flapping
- Incident Replay: BIOS Settings Reverted After CMOS Battery Replacement
- Incident Replay: BMC Clock Skew Causes Certificate Failure
- Incident Replay: CNI Broken After Node Restart
- Incident Replay: Cable Plugged Into Wrong Port
- Incident Replay: CoreDNS Timeout — Pod DNS Resolution Failing
- Incident Replay: CrashLoopBackOff with No Logs
- Incident Replay: DHCP Relay Broken
- Incident Replay: DNS Resolution Slow
- Incident Replay: DNS Split-Horizon Confusion
- Incident Replay: DaemonSet Blocks Node Eviction
- Incident Replay: Disk Full on Root Partition — Services Down
- Incident Replay: Duplex Mismatch Symptoms
- Incident Replay: Firewall Shadow Rule
- Incident Replay: Firmware Update Causes Boot Loop
- Incident Replay: HBA Firmware Mismatch
- Incident Replay: ImagePullBackOff — Registry Authentication Failure
- Incident Replay: Inode Exhaustion
- Incident Replay: Jumbo Frames Partial Deployment
- Incident Replay: Kernel Soft Lockup
- Incident Replay: LACP Mismatch — One Link Hot
- Incident Replay: Link Flaps from Bad Optic
- Incident Replay: MTU Blackhole — TLS Stalls
- Incident Replay: Memory ECC Errors Increasing
- Incident Replay: Multicast Not Crossing Router
- Incident Replay: NAT Exhaustion — Intermittent Connectivity
- Incident Replay: NVMe Drive Disappeared
- Incident Replay: Network Bonding Failover Not Working
- Incident Replay: Network Loop — Broadcast Storm
- Incident Replay: Node Drain Blocked by PDB
- Incident Replay: Node Pressure Evictions
- Incident Replay: OOM Killer Events
- Incident Replay: OS Install Fails — RAID Controller Not Detected
- Incident Replay: OSPF Stuck in ExStart
- Incident Replay: PXE Boot Fails — UEFI Mismatch
- Incident Replay: Persistent Volume Stuck Terminating
- Incident Replay: Power Supply Redundancy Lost
- Incident Replay: Proxy ARP Causing Issues
- Incident Replay: RAID Degraded — Rebuild Latency
- Incident Replay: Rack PDU Overload Alert
- Incident Replay: Resource Quota Blocking Deployment
- Incident Replay: Runaway Logs Fill Disk
- Incident Replay: SELinux Denying Service
- Incident Replay: SSL Certificate Chain Incomplete
- Incident Replay: Serial Console Output Garbled
- Incident Replay: Server Intermittent Reboots
- Incident Replay: Server Remote Console Lag
- Incident Replay: Service Has No Endpoints
- Incident Replay: Source Routing Policy Miss
- Incident Replay: Stuck NFS Mount
- Incident Replay: TCP RST After Idle
- Incident Replay: Thermal Throttling from Fan Failure
- Incident Replay: Time Sync Skew Breaks Application
- Incident Replay: VLAN Trunk Mistag
- Incident Replay: Zombie Processes Accumulating
- Incident Replay: iDRAC Unreachable but OS Running
- Incident Replay: iptables Blocking Unexpected Traffic
- Incident Replay: systemd Service Flapping
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Kubernetes Operations Case Studies
- Linux Operations Case Studies
- Networking Case Studies
- Ops Archaeology: Reverse-Engineering Production Systems
- Ops Archaeology: The 5% That Can't Resolve
- Ops Archaeology: The 5% That Can't Resolve
- Ops Archaeology: The Alerts That Stopped Firing
- Ops Archaeology: The Alerts That Stopped Firing
- Ops Archaeology: The Certificate That Works Sometimes
- Ops Archaeology: The Certificate That Works Sometimes
- Ops Archaeology: The Cluster That Disagrees With Itself
- Ops Archaeology: The Cluster That Disagrees With Itself
- Ops Archaeology: The Container That Exits Immediately
- Ops Archaeology: The Container That Exits Immediately
- Ops Archaeology: The DR That Looks Ready But Isn't
- Ops Archaeology: The DR That Looks Ready But Isn't
- Ops Archaeology: The Deploy That Didn't Deploy
- Ops Archaeology: The Deploy That Didn't Deploy
- Ops Archaeology: The Gateway That Returns 502
- Ops Archaeology: The Gateway That Returns 502
- Ops Archaeology: The Job That Succeeded Wrong
- Ops Archaeology: The Job That Succeeded Wrong
- Ops Archaeology: The Pods That Won't Schedule
- Ops Archaeology: The Pods That Won't Schedule
- Ops Archaeology: The Replica That Fell Behind
- Ops Archaeology: The Replica That Fell Behind
- Ops Archaeology: The Requests That Vanish
- Ops Archaeology: The Requests That Vanish
- Ops Archaeology: The Service That Won't Start
- Ops Archaeology: The Service That Won't Start
- Ops Archaeology: The Session Store That Keeps Dying
- Ops Archaeology: The Session Store That Keeps Dying
- Ops Archaeology: The Slow Death Nobody Noticed
- Ops Archaeology: The Slow Death Nobody Noticed
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: BMC Clock Skew - Certificate Failure
- Questions: DHCP Not Working on Remote VLAN
- Questions: DNS Resolution Taking 5+ Seconds Intermittently
- Questions: Disk Full Root - Services Down
- Questions: Firmware Update Boot Loop
- Questions: Jumbo Frames Enabled But Some Paths Failing
- Questions: Link Flaps - Bad Optic
- Questions: Memory ECC Errors Increasing
- Questions: Multicast Traffic Not Crossing Router
- Questions: Network Experiencing Broadcast Storm and High CPU on Switches
- Questions: OSPF Adjacency Stuck in ExStart/Exchange State
- Questions: PXE Boot Fails - UEFI Mismatch
- Questions: Power Supply Redundancy Lost
- Questions: Proxy ARP Causing Unexpected Routing Behavior
- Questions: RAID Degraded Rebuild Latency
- Questions: TCP Connections Reset After Idle Period
- Questions: TLS Works From Some Clients But Fails From Others
- Questions: Thermal Throttle - Fan Failure
- Questions: Traffic From Specific Source Not Taking Expected Path
- Questions: iDRAC Unreachable, OS Up
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution: ARP Flux / Duplicate IP
- Solution: Asymmetric Routing / One-Direction Failure
- Solution: BGP Peer Flapping
- Solution: BIOS Settings Reset After CMOS Battery Replacement
- Solution: BMC Clock Skew - Certificate Failure
- Solution: Bonding Failover Not Working
- Solution: DHCP Not Working on Remote VLAN
- Solution: DNS Resolution Taking 5+ Seconds Intermittently
- Solution: DNS Split-Horizon Confusion
- Solution: Disk Full Root - Services Down
- Solution: Duplex Mismatch
- Solution: Firewall Shadow Rule
- Solution: Firmware Update Boot Loop
- Solution: HBA Firmware Mismatch Causing I/O Errors
- Solution: Jumbo Frames Enabled But Some Paths Failing
- Solution: LACP Mismatch / One Link Hot
- Solution: Link Flaps - Bad Optic
- Solution: MTU Black Hole / TLS Stalls
- Solution: Memory ECC Errors Increasing
- Solution: Multicast Traffic Not Crossing Router
- Solution: NAT Port Exhaustion / Intermittent Failures
- Solution: NVMe Drive Disappeared After Reboot
- Solution: Network Experiencing Broadcast Storm and High CPU on Switches
- Solution: OS Install Fails - RAID Controller Driver Missing
- Solution: OSPF Adjacency Stuck in ExStart/Exchange State
- Solution: PDU Overload Warning - Phase Imbalance
- Solution: PXE Boot Fails - UEFI Mismatch
- Solution: Power Supply Redundancy Lost
- Solution: Proxy ARP Causing Unexpected Routing Behavior
- Solution: RAID Degraded Rebuild Latency
- Solution: Remote KVM/Console Extremely Laggy
- Solution: Serial-over-LAN Output Garbled
- Solution: Server Cabled to Wrong Switch Port
- Solution: Server Intermittent Reboots
- Solution: TCP Connections Reset After Idle Period
- Solution: TLS Works From Some Clients But Fails From Others
- Solution: Thermal Throttle - Fan Failure
- Solution: Traffic From Specific Source Not Taking Expected Path
- Solution: VLAN Trunk Mistag
- Solution: iDRAC Unreachable, OS Up
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: ARP Flux / Duplicate IP
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Asymmetric Routing / One-Direction Failure
- Symptoms: BGP Peer Flapping
- Symptoms: BIOS Settings Reverted After CMOS Battery Replacement
- Symptoms: BMC Clock Skew - Certificate Failure
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: DHCP Not Working on Remote VLAN
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: DNS Resolution Taking 5+ Seconds Intermittently
- Symptoms: DNS Split-Horizon Confusion
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Disk Full Root - Services Down
- Symptoms: Duplex Mismatch
- Symptoms: Firewall Shadow Rule
- Symptoms: Firmware Update Boot Loop
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: HBA Firmware Mismatch Causing I/O Errors
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Jumbo Frames Enabled But Some Paths Failing
- Symptoms: LACP Mismatch / One Link Hot
- Symptoms: Link Flaps - Bad Optic
- Symptoms: MTU Black Hole / TLS Stalls
- Symptoms: Memory ECC Errors Increasing
- Symptoms: Multicast Traffic Not Crossing Router
- Symptoms: NAT Port Exhaustion / Intermittent Failures
- Symptoms: NVMe Drive Disappeared After Reboot
- Symptoms: Network Bonding Failover Not Working
- Symptoms: Network Experiencing Broadcast Storm and High CPU on Switches
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: OS Installation Cannot See Disks
- Symptoms: OSPF Adjacency Stuck in ExStart/Exchange State
- Symptoms: PDU Reporting Overload Warning
- Symptoms: PXE Boot Fails - UEFI Mismatch
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Power Supply Redundancy Lost
- Symptoms: Proxy ARP Causing Unexpected Routing Behavior
- Symptoms: RAID Degraded Rebuild Latency
- Symptoms: Remote KVM/Console Extremely Laggy
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Serial-over-LAN Output Garbled
- Symptoms: Server Cabled to Wrong Switch Port / Wrong VLAN
- Symptoms: Server Randomly Rebooting Every Few Hours
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: TCP Connections Reset After Idle Period
- Symptoms: TLS Works From Some Clients But Fails From Others
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Symptoms: Thermal Throttle - Fan Failure
- Symptoms: Traffic From Specific Source Not Taking Expected Path
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: VLAN Trunk Mistag
- Symptoms: iDRAC Unreachable, OS Up
ceph¶
cert-manager¶
- Diagnostic Questions
- Grading Rubric
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Primer
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- cert-manager Footguns
- cert-manager — Street-Level Ops
cert_manager¶
certificates¶
certification¶
certification_prep¶
- Certification Exam Prep
- Certification Prep: AWS SAA — Solutions Architect Associate
- Certification Prep: CKA — Certified Kubernetes Administrator
- Certification Prep: CKAD — Certified Kubernetes Application Developer
- Certification Prep: CKS — Certified Kubernetes Security Specialist
- Certification Prep: HashiCorp Terraform Associate
- Certification Prep: PCA — Prometheus Certified Associate
cgroup¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
cgroups_namespaces¶
change-management¶
change_management¶
chaos-engineering¶
- Chaos Engineering & Fault Injection - Street-Level Ops
- Chaos Engineering Footguns
- Chaos Engineering — Trivia & Interesting Facts
- Primer
cheat-sheet¶
- AI-Assisted DevOps Cookbook
- CI Pipeline Documentation
- Deep Dive: CI/CD Pipeline Architecture
- Deep Dive: Containers How They Really Work
- Deep Dive: Docker Image Internals
- Deep Dive: Kubernetes Networking
- Deep Dive: Kubernetes Pod Lifecycle
- Deep Dive: Linux Boot Sequence
- Deep Dive: Linux Memory Management
- Deep Dive: Linux Network Packet Flow
- Deep Dive: Linux Performance Debugging
- Deep Dive: Systemd Architecture
- Deep Dive: Terraform State Internals
- DevOps Learning Roadmap
- Track: Containers
- Track: Foundations
- Track: Helm & Release Ops
- Track: Incident Response
- Track: Infrastructure
- Track: Kubernetes Core
- Track: Observability
- kubectl Debugging Cheatsheet
cheatsheet¶
- Alerting Rules Cheatsheet
- Bash Cheatsheet
- Cheatsheet
- Cicd Cheatsheet
- Cloud Deep Dive Cheatsheet
- Cloud Ops Cheatsheet
- Container Runtime Debug Cheatsheet
- Database Ops Cheatsheet
- Datacenter Cheatsheet
- Docker Cheatsheet
- Etcd Operations Cheatsheet
- Finops Cheatsheet
- Git Cheatsheet
- Gitops Argocd Cheatsheet
- Helm Cheatsheet
- K8S Operators Cheatsheet
- K8S Yaml Patterns Cheatsheet
- Kubernetes Core Cheatsheet
- Linux Ops Cheatsheet
- Modern Cli Cheatsheet
- Networking Cheatsheet
- Observability Cheatsheet
- Overview
- Phone Interview Devops Linux Cheatsheet
- Policy Engines Cheatsheet
- Postmortem Slo Cheatsheet
- Python Devops Cheatsheet
- Secrets Management Cheatsheet
- Security Cheatsheet
- Service Mesh Cheatsheet
- Ssh Cheatsheet
- Systemd Cheatsheet
- Terraform Cheatsheet
- Tls Pki Cheatsheet
- Troubleshooting Flows Cheatsheet
checkly¶
ci-cd¶
- Diagnostic Questions
- Grading Rubric
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
cicd¶
- Anti-Primer: CI/CD
- Argo CD & GitOps — Trivia & Interesting Facts
- Argo Workflows — Trivia & Interesting Facts
- CI Pipeline Documentation
- CI/CD Drills
- CI/CD Footguns
- CI/CD Pipelines & Patterns
- CI/CD Pipelines - Street Ops
- CI/CD — Trivia & Interesting Facts
- Comparison: CI Platforms
- Comparison: GitOps CD
- Dagger — Trivia & Interesting Facts
- Deep Dive: CI/CD Pipeline Architecture
- Feature Flags — Trivia & Interesting Facts
- GitHub Actions — Trivia & Interesting Facts
- GitOps — Trivia & Interesting Facts
- Interview: CI Vuln Scan Failed
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Primer
- Primer
- Skillcheck: CI/CD
cicd_patterns¶
cilium¶
- Anti-Primer: Cilium
- Cilium & eBPF Networking
- Cilium & eBPF Networking - Street-Level Ops
- Cilium & eBPF Networking Footguns
cisco¶
- Anti-Primer: Cisco Fundamentals For Devops
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Cisco Fundamentals for DevOps
- Primer
- Scenario: VLAN Trunk Mismatch
claude_code¶
cli¶
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Modern CLI Workflows — Trivia & Interesting Facts
- Modern CLI — Trivia & Interesting Facts
- Primer
- Primer
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- Skillcheck: Modern CLI Tools
- fd — Trivia & Interesting Facts
- fzf — Trivia & Interesting Facts
- jq — Trivia & Interesting Facts
cli-tools¶
- Linux Text Processing
- Linux Text Processing - Street-Level Ops
- Linux Text Processing Footguns
- Linux Text Processing — Trivia & History
- Make & Build Systems — Footguns
- Make & Build Systems — Street Ops
- Modern Cli Workflows
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- awk — Footguns
- awk — Street-Level Ops
- awk: The Record/Field Processor
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
- curl & wget — Trivia & Interesting Facts
- find - Footguns & Pitfalls
- find - Street-Level Ops
- grep & Regular Expressions
- grep & Regular Expressions - Footguns
- grep & Regular Expressions - Street-Level Ops
- rsync - Street Ops
- rsync Footguns
- sed — Footguns
- sed — Street-Level Ops
- sed: The Stream Editor
- tar & Compression - Footguns
- tar & Compression - Street-Level Ops
- xargs - Footguns & Pitfalls
- xargs - Street Ops
clock-skew¶
- Diagnostic Questions
- Grading Rubric
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
cloud¶
- AWS EC2
- AWS EC2 - Street-Level Ops
- AWS EC2 Footguns
- AWS EC2 — Trivia & Interesting Facts
- AWS IAM
- AWS IAM - Street-Level Ops
- AWS IAM Footguns
- AWS IAM — Trivia & Interesting Facts
- AWS Lambda
- AWS Lambda - Street-Level Ops
- AWS Lambda Footguns
- AWS Lambda — Trivia & Interesting Facts
- AWS Networking
- AWS Networking - Street-Level Ops
- AWS Networking Footguns
- AWS Networking — Trivia & Interesting Facts
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- AWS S3 Deep Dive
- AWS Troubleshooting — Trivia & Interesting Facts
- Azure Troubleshooting — Trivia & Interesting Facts
- Cloud Deep Dive Drills
- Cloud Deep Dive — Trivia & Interesting Facts
- Cloud Deep-Dive Footguns
- Cloud Ops Basics — Trivia & Interesting Facts
- Cloud Ops Drills
- Cloud Provider Deep-Dive - Street-Level Ops
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- FinOps — Trivia & Interesting Facts
- GCP Troubleshooting — Trivia & Interesting Facts
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Multi-Cluster & Federation - Exercises & Reference
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Runbook: VPC IP Exhaustion
- Skillcheck: Cloud Providers
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
cloud-deep-dive¶
- Cloud Deep Dive Drills
- Cloud Deep-Dive Footguns
- Cloud Ops Drills
- Cloud Provider Deep-Dive - Street-Level Ops
- Multi-Cluster & Federation - Exercises & Reference
- Primer
- Runbook: VPC IP Exhaustion
- Skillcheck: Cloud Providers
cloud_deep_dive¶
cloud_ops_basics¶
commands¶
comparisons¶
compliance¶
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Compliance Automation — Trivia & Interesting Facts
- Primer
- Primer
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
compliance_automation¶
concepts¶
concurrency¶
config¶
configuration¶
configuration-management¶
configuration_management¶
- Configuration Management
- How We Got Here: Configuration Management
- How We Got Here: Infrastructure as Code
consensus¶
consul¶
container-runtime¶
container_images¶
containers¶
- Container Base Images — Footguns & Pitfalls
- Container Base Images — Street Ops
- Container Base Images — Trivia & Interesting Facts
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Containers Deep Dive — Trivia & Interesting Facts
- Docker — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- cgroups & Linux Namespaces - Street Ops
- cgroups & Namespaces Footguns
containers_deep_dive¶
continuous-profiling¶
continuous_profiling¶
corporate-it¶
corporate_it¶
cpu-throttle¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
crashloop¶
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Ops Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
crashloopbackoff¶
- Anti-Primer: Crashloopbackoff
- CrashLoopBackOff
- CrashLoopBackOff - Street-Level Ops
- CrashLoopBackOff Footguns
- Thinking Out Loud: CrashLoopBackOff
cron¶
- Anti-Primer: Cron Scheduling
- Cron & Job Scheduling
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- Primer
cross-domain¶
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL
- Case Study: Alert Storm — Flapping Health Checks
- Case Study: Ansible Playbook Hangs — SSH Agent Forwarding Blocked by Firewall
- Case Study: Backup Job Failing — iSCSI Target Unreachable, VLAN Misconfigured
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption
- Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured
- Case Study: Container Vuln Scanner False Positive Blocks Deploy
- Case Study: DNS Looks Broken — TLS Expired, Fix Is Cert-Manager
- Case Study: Database Replication Lag — Root Cause Is RAID Degradation
- Case Study: Deployment Stuck — ImagePull Auth Failure, Vault Secret Rotation
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention
- Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP
- Case Study: Job Queue Backlog — Worker Pod CPU Throttled by cgroup
- Case Study: Node NotReady — NIC Firmware Bug, Fix Is Ansible Playbook
- Case Study: Pod OOMKilled — Memory Leak in Sidecar, Fix Is Helm Values
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable
- Case Study: Service Mesh 503s — Envoy Misconfigured, RBAC Policy
- Case Study: Terraform Apply Fails — State Lock Stuck, DynamoDB Throttle
- Case Study: User Auth Failing — OIDC Cert Expired, Cloud KMS Rotation
- Cross-Domain
- Cross-Domain Incident Case Studies
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
crossplane¶
css¶
css_fundamentals¶
curl_and_wget¶
curriculum¶
- Coverage Gaps Analysis
- K8s learning plan
- Level 1: Foundations
- Level 2: Container Platform
- Level 3: Production Kubernetes
- Level 4: Operations & Observability
- Level 5: SRE & Incident Response
- Level 6: Advanced Platform Engineering
- Level 7: SRE & Cloud Operations
- Master Curriculum: 40 Weeks
- Track: Advanced Platform Engineering
- Track: Cloud & FinOps
- Track: Health & Wellness
- Track: Learning & Cognition
- Track: Life Skills & Practical Knowledge
- Track: Modern CLI Tools
- Track: Professional Skills
- Track: SRE & Reliability Engineering
- Training Curriculum
dagger¶
data¶
data-fetching¶
data-management¶
data_modeling¶
database¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
database-ops¶
- Database Operations - Street-Level Ops
- Database Ops Drills
- Database Ops Footguns
- Interview: Database Failover During Deploy
- MongoDB Operations Footguns
- MongoDB Operations — Street-Level Ops
- MySQL / MariaDB Operations Footguns
- MySQL / MariaDB Operations — Street-Level Ops
- Primer
- Primer
- Primer
- Primer
- SQL Fundamentals Footguns
- SQL Fundamentals — Street-Level Ops
- Skillcheck: Database Ops
database_internals¶
- Anti-Primer: Database Internals
- Database Internals
- Database Internals - Street-Level Ops
- Database Internals Footguns
database_ops¶
databases¶
- Database Internals — Trivia & Interesting Facts
- Database Operations — Trivia & Interesting Facts
- Databases
- Elasticsearch — Trivia & Interesting Facts
- Kafka — Trivia & Interesting Facts
- MongoDB Operations — Trivia & Interesting Facts
- MySQL Operations — Trivia & Interesting Facts
- PostgreSQL — Trivia & Interesting Facts
- RabbitMQ — Trivia & Interesting Facts
- Redis — Trivia & Interesting Facts
- SQL Fundamentals — Trivia & Interesting Facts
- SQLite — Trivia & Interesting Facts
datacenter¶
- Bare Metal Provisioning — Trivia & Interesting Facts
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Case Study: BIOS Settings Reset After CMOS
- Case Study: BMC Clock Skew Cert Failure
- Case Study: Bonding Failover Not Working
- Case Study: Cable Management Wrong Port
- Case Study: Disk Full Root Services Down
- Case Study: Firmware Update Boot Loop
- Case Study: HBA Firmware Mismatch
- Case Study: Link Flaps Bad Optic
- Case Study: Memory ECC Errors Increasing
- Case Study: NVMe Drive Disappeared
- Case Study: OS Install Fails RAID Controller
- Case Study: PXE Boot Fails UEFI Mismatch
- Case Study: Power Supply Redundancy Lost
- Case Study: RAID Degraded Rebuild Latency
- Case Study: Rack PDU Overload Alert
- Case Study: Serial Console Garbled
- Case Study: Server Intermittent Reboot
- Case Study: Server Remote Console Lag
- Case Study: Thermal Throttle Fan Failure
- Case Study: iDRAC Unreachable OS Up
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Drills
- Datacenter Footguns
- Datacenter Operations Case Studies
- Datacenter — Trivia & Interesting Facts
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: BMC Clock Skew - Certificate Failure
- Grading Checklist: Disk Full Root - Services Down
- Grading Checklist: Firmware Update Boot Loop
- Grading Checklist: Link Flaps - Bad Optic
- Grading Checklist: Memory ECC Errors Increasing
- Grading Checklist: PXE Boot Fails - UEFI Mismatch
- Grading Checklist: Power Supply Redundancy Lost
- Grading Checklist: RAID Degraded Rebuild Latency
- Grading Checklist: Thermal Throttle - Fan Failure
- Grading Checklist: iDRAC Unreachable, OS Up
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Incident Replay: BIOS Settings Reverted After CMOS Battery Replacement
- Incident Replay: BMC Clock Skew Causes Certificate Failure
- Incident Replay: Cable Plugged Into Wrong Port
- Incident Replay: Disk Full on Root Partition — Services Down
- Incident Replay: Firmware Update Causes Boot Loop
- Incident Replay: HBA Firmware Mismatch
- Incident Replay: Link Flaps from Bad Optic
- Incident Replay: Memory ECC Errors Increasing
- Incident Replay: NVMe Drive Disappeared
- Incident Replay: Network Bonding Failover Not Working
- Incident Replay: OS Install Fails — RAID Controller Not Detected
- Incident Replay: PXE Boot Fails — UEFI Mismatch
- Incident Replay: Power Supply Redundancy Lost
- Incident Replay: RAID Degraded — Rebuild Latency
- Incident Replay: Rack PDU Overload Alert
- Incident Replay: Serial Console Output Garbled
- Incident Replay: Server Intermittent Reboots
- Incident Replay: Server Remote Console Lag
- Incident Replay: Thermal Throttling from Fan Failure
- Incident Replay: iDRAC Unreachable but OS Running
- Interview: Server Won't POST
- Mellanox Switches
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: BMC Clock Skew - Certificate Failure
- Questions: Disk Full Root - Services Down
- Questions: Firmware Update Boot Loop
- Questions: Link Flaps - Bad Optic
- Questions: Memory ECC Errors Increasing
- Questions: PXE Boot Fails - UEFI Mismatch
- Questions: Power Supply Redundancy Lost
- Questions: RAID Degraded Rebuild Latency
- Questions: Thermal Throttle - Fan Failure
- Questions: iDRAC Unreachable, OS Up
- Redfish -- Footguns
- Redfish -- Street Ops
- Scenario: NIC Flapping / LACP Mismatch
- Scenario: OOB Unreachable but Host Responds
- Scenario: RAID Array Degraded
- Scenario: Server Won't Boot After Update
- Scenario: Thermal Throttling
- Skillcheck: Datacenter
- Solution: BIOS Settings Reset After CMOS Battery Replacement
- Solution: BMC Clock Skew - Certificate Failure
- Solution: Bonding Failover Not Working
- Solution: Disk Full Root - Services Down
- Solution: Firmware Update Boot Loop
- Solution: HBA Firmware Mismatch Causing I/O Errors
- Solution: Link Flaps - Bad Optic
- Solution: Memory ECC Errors Increasing
- Solution: NVMe Drive Disappeared After Reboot
- Solution: OS Install Fails - RAID Controller Driver Missing
- Solution: PDU Overload Warning - Phase Imbalance
- Solution: PXE Boot Fails - UEFI Mismatch
- Solution: Power Supply Redundancy Lost
- Solution: RAID Degraded Rebuild Latency
- Solution: Remote KVM/Console Extremely Laggy
- Solution: Serial-over-LAN Output Garbled
- Solution: Server Cabled to Wrong Switch Port
- Solution: Server Intermittent Reboots
- Solution: Thermal Throttle - Fan Failure
- Solution: iDRAC Unreachable, OS Up
- Storage Operations - Street-Level Ops
- Storage Operations Footguns
- Symptoms: BIOS Settings Reverted After CMOS Battery Replacement
- Symptoms: BMC Clock Skew - Certificate Failure
- Symptoms: Disk Full Root - Services Down
- Symptoms: Firmware Update Boot Loop
- Symptoms: HBA Firmware Mismatch Causing I/O Errors
- Symptoms: Link Flaps - Bad Optic
- Symptoms: Memory ECC Errors Increasing
- Symptoms: NVMe Drive Disappeared After Reboot
- Symptoms: Network Bonding Failover Not Working
- Symptoms: OS Installation Cannot See Disks
- Symptoms: PDU Reporting Overload Warning
- Symptoms: PXE Boot Fails - UEFI Mismatch
- Symptoms: Power Supply Redundancy Lost
- Symptoms: RAID Degraded Rebuild Latency
- Symptoms: Remote KVM/Console Extremely Laggy
- Symptoms: Serial-over-LAN Output Garbled
- Symptoms: Server Cabled to Wrong Switch Port / Wrong VLAN
- Symptoms: Server Randomly Rebooting Every Few Hours
- Symptoms: Thermal Throttle - Fan Failure
- Symptoms: iDRAC Unreachable, OS Up
- Virtualization - Street-Level Ops
- Virtualization Footguns
datacenter-ops¶
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
datacenter_oob_and_provisioning¶
- Anti-Primer: Bare Metal Provisioning
- Anti-Primer: Datacenter
- Bare-Metal Provisioning
- Bare-Metal Provisioning
- Datacenter & Server Hardware
- Dell Server Management
- Rack & Data Center Operations
debian¶
- Debian & Ubuntu — Footguns & Pitfalls
- Debian & Ubuntu — Street Ops
- Debian & Ubuntu — Trivia & Interesting Facts
- Primer
debian_ubuntu¶
debugging¶
- Containers Deep Dive — Trivia & Interesting Facts
- CrashLoopBackOff — Trivia & Interesting Facts
- Debugging & Diagnosis Models
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- Kubernetes Debugging Playbook — Trivia & Interesting Facts
- OOMKilled — Trivia & Interesting Facts
- Primer
- Python Debugging
- Python Debugging Footguns
- Python Debugging — Street-Level Ops
- SQL Fundamentals — Street-Level Ops
- strace Footguns
- strace — Street-Level Ops
debugging-methodology¶
debugging_methodology¶
decision_trees¶
deep-dive¶
dell-poweredge¶
dell_poweredge¶
deployment¶
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Progressive Delivery — Trivia & Interesting Facts
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
deployments¶
developer-experience¶
developer-tools¶
- Claude Code — Trivia & Interesting Facts
- Dagger — Trivia & Interesting Facts
- Feature Flags — Trivia & Interesting Facts
- Nix — Trivia & Interesting Facts
devex¶
devops¶
- AI DevOps Tools — Trivia & Interesting Facts
- AI Tools for DevOps - Footguns
- AI Tools for DevOps - Street Ops
- AI-Assisted DevOps Cookbook
- AI/ML Ops Footguns
- Ansible Deep Dive - Footguns
- Ansible Deep Dive - Street Ops
- Ansible Footguns
- Ansible for Infrastructure Automation - Street Ops
- Ansible: idempotence + modules vs plugins vs collections
- Ansible: inventory — hosts, groups, vars, targeting
- Ansible: playbook vs play vs task vs role vs handler
- Ansible: variable precedence
- ArgoCD & GitOps Footguns
- ArgoCD & GitOps — Street-Level Ops
- Btrfs: subvolume, snapshot, reflink, CoW
- CI Pipeline Documentation
- CI/CD Drills
- CI/CD Footguns
- CI/CD Pipelines - Street Ops
- CI/CD as a System
- CSS Fundamentals
- CSS Fundamentals Footguns
- CSS Fundamentals — Street-Level Ops
- Capacity Planning - Street-Level Ops
- Capacity Planning Footguns
- Career Engineering Footguns
- Career Engineering for Ops People - Street-Level Ops
- Ceph Storage Footguns
- Ceph Storage — Street-Level Ops
- Change Management - Street-Level Ops
- Change Management Footguns
- Chaos Engineering & Fault Injection - Street-Level Ops
- Chaos Engineering Footguns
- Cloud Operations Basics - Street Ops
- Cloud Ops Footguns
- Container vs VM
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Corporate IT Fluency - Street-Level Ops
- Corporate IT Fluency Footguns
- Cost Optimization & FinOps - Street-Level Ops
- Crossplane - Street-Level Ops
- Crossplane Footguns
- DNS: Stub Resolver vs Recursive Resolver vs Authoritative Server
- DORA Metrics & DevEx Footguns
- DORA Metrics & DevEx — Street-Level Ops
- Debugging Methodology - Street-Level Ops
- Debugging Methodology Footguns
- Deep Dive: CI/CD Pipeline Architecture
- Deep Dive: Terraform State Internals
- Deployment vs ReplicaSet vs Pod
- DevOps Learning Roadmap
- Distributed Systems Footguns
- Distributed Systems Fundamentals — Street-Level Ops
- Drills
- Edge & IoT Infrastructure - Street-Level Ops
- Edge & IoT Infrastructure Footguns
- Feature Flags Footguns
- Feature Flags — Street-Level Ops
- File vs inode vs pathname vs symlink
- FinOps Drills
- FinOps Footguns
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Footguns
- Footguns
- Git Advanced — Trivia & Interesting Facts
- Git Drills
- Git Footguns
- Git for DevOps Engineers - Street Ops
- Git — Trivia & Interesting Facts
- Git: commit vs branch vs tag vs HEAD
- Git: rebase vs merge
- Git: working tree vs index vs repository
- GitHub Actions - Street-Level Ops
- GitHub Actions Footguns
- GitOps & ArgoCD Drills
- Helm Drills
- Homelab & Learning Infrastructure - Street-Level Ops
- Homelab Footguns
- Image vs Container
- Incident Command & On-Call - Street-Level Ops
- Incident Command & On-Call Footguns
- Incident Postmortem & SLO/SLI - Street-Level Ops
- Infrastructure Testing Footguns
- Infrastructure Testing — Street-Level Ops
- Infrastructure as Code with Terraform - Street Ops
- Interview: Config Drift Detected
- Interview: Cost Spike Investigation
- Interview: GitOps Drift Detected
- Interview: Helm Upgrade Broke Prod
- Kubernetes Control Plane as Reconciliation Engine
- Legacy System Archaeology - Street-Level Ops
- Legacy System Archaeology Footguns
- Linux: kernel vs userspace vs distro
- Load Testing Footguns
- Load Testing — Street-Level Ops
- Logs vs Metrics vs Traces
- Make & Build Systems — Footguns
- Make & Build Systems — Street Ops
- Make & Build Systems — Trivia & Interesting Facts
- Mental-Model-First Learning Guide
- Modern CLI Workflows Footguns
- MongoDB Operations Footguns
- MongoDB Operations — Street-Level Ops
- MySQL / MariaDB Operations Footguns
- MySQL / MariaDB Operations — Street-Level Ops
- Nginx & Web Servers - Street-Level Ops
- Nginx & Web Servers Footguns
- Nix / NixOS - Street-Level Ops
- Nix / NixOS Footguns
- Ops War Stories & Pattern Recognition - Street-Level Ops
- Ops War Stories & Pattern Recognition Footguns
- Permissions: mode bits vs ownership vs ACLs vs capabilities
- Persistent Volume vs Persistent Volume Claim
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Platform Engineering — Trivia & Interesting Facts
- Pod vs Container (Kubernetes)
- PostgreSQL Footguns
- PostgreSQL Operations - Street-Level Ops
- Postmortem & SLO Drills
- Postmortem & SLO Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process vs program vs service
- Progressive Delivery — Trivia & Interesting Facts
- Pulumi — Trivia & Interesting Facts
- Python Async & Concurrency
- Python Debugging
- Python Debugging Footguns
- Python Debugging — Street-Level Ops
- Python Drills
- Python Packaging
- Python Packaging Footguns
- Python Packaging — Street-Level Ops
- Python for Infrastructure - Street-Level Ops
- Python for Infrastructure Footguns
- RAID vs Backup vs Snapshot
- RHCE (EX294) — Footguns & Pitfalls
- RHCE (EX294) — Street Ops
- Redis Footguns
- Redis Operations - Street-Level Ops
- Reverse Proxy vs Load Balancer
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- Runbook: ArgoCD Out of Sync
- Runbook: Helm Upgrade Failed
- S3-Compatible Object Storage Footguns
- S3-Compatible Object Storage — Street-Level Ops
- SQL Fundamentals Footguns
- SQL Fundamentals — Street-Level Ops
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
- Service vs Ingress (Kubernetes Networking)
- Skillcheck
- Skillcheck: CI/CD
- Skillcheck: Cloud Basics
- Skillcheck: DevOps Roadmap (Expanded)
- Skillcheck: FinOps
- Skillcheck: Git
- Skillcheck: GitOps
- Skillcheck: Helm & Release Ops
- Skillcheck: Kubernetes
- Skillcheck: Postmortems & SLOs
- Skillcheck: Python Automation
- Skillcheck: Terraform / IaC
- Storage Stack: Disk, Partition, LVM, Filesystem, Mount
- Street ops
- Street ops
- Systemd Units: Unit, Service, Target, Start vs Enable
- Systems Thinking Footguns
- Systems Thinking for Engineers - Street-Level Ops
- Systems Thinking — Trivia & Interesting Facts
- Terraform Deep Dive - Footguns
- Terraform Deep Dive - Street Ops
- Terraform Drills
- Terraform Footguns
- Terraform: Desired State Engine
- The Ops of AI/ML Workloads - Street-Level Ops
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
- Track: Helm & Release Ops
- Track: Incident Response
- Track: Infrastructure
- Trivia
- Trivia
- Trivia compendium
- Trivia compendium
- VS Code Footguns
- VS Code for DevOps - Street Ops
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
- YAML, JSON & Config Formats - Footguns
- YAML, JSON & Config Formats - Street Ops
devops-tooling¶
- Backstage - Street-Level Ops
- Backstage Footguns
- Dagger - Street-Level Ops
- Dagger Footguns
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- OpenTofu - Street-Level Ops
- OpenTofu Footguns
- Pulumi - Street-Level Ops
- Pulumi Footguns
- RabbitMQ Footguns
- RabbitMQ Operations - Street-Level Ops
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- SQLite Footguns
- SQLite Operations & Internals - Street-Level Ops
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- WebAssembly for Infrastructure - Street-Level Ops
- WebAssembly for Infrastructure Footguns
dhcp¶
dhcp_ipam¶
disaster-recovery¶
disaster_recovery¶
disk¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Linux Ops Storage
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
disk-and-storage-ops¶
disk-ops¶
disk-troubleshooting¶
disk_storage_ops¶
- Anti-Primer: Disk And Storage Ops
- Anti-Primer: Storage Ops
- Disk & Storage Ops - Street-Level Ops
- Disk & Storage Ops Footguns
- Storage Operations
- Thinking Out Loud: Disk & Storage Ops
distributed-storage¶
distributed-systems¶
distributed_systems¶
distributions¶
distro¶
- Linux Distribution Comparison — Footguns & Pitfalls
- Linux Distribution Comparison — Street Ops
- Primer
dkim¶
dmarc¶
dnf¶
dns¶
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- Anti-Primer: DNS Ops
- Case Study: CoreDNS Timeout Pod DNS
- DHCP & IP Address Management - Street-Level Ops
- DHCP & IP Address Management Footguns
- DNS Deep Dive - Footguns
- DNS Deep Dive - Street-Level Ops
- DNS Operations
- DNS Operations - Street-Level Ops
- DNS Operations Footguns
- Diagnostic Questions
- Grading Checklist
- Grading Rubric
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Drills
- Networking Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- Questions to Determine
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Scenario: DNS Looks Fine but App Fails
- Skillcheck: Networking Fundamentals
- Solution
- Symptoms
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Thinking Out Loud: DNS Ops
dns-security¶
dns_deep_dive¶
dnssec¶
- Anti-Primer: DNSSEC
- DNSSEC & DNS Security
- DNSSEC & DNS Security Footguns
- DNSSEC & DNS Security — Street-Level Ops
- Primer
docker¶
- Anti-Primer: Docker
- Container Base Images — Footguns & Pitfalls
- Container Base Images — Street Ops
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Deep Dive: Containers How They Really Work
- Deep Dive: Docker Image Internals
- Diagnostic Questions
- Docker
- Docker / Containers - Street-Level Ops
- Docker Drills
- Docker Footguns
- Docker — Trivia & Interesting Facts
- Grading Rubric
- Interview: Docker Container Debugging
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Primer
- Primer
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Runbook: ImagePullBackOff
- Skillcheck: Docker
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Thinking Out Loud: Docker
- Track: Containers
- Track: Foundations
doh¶
domains¶
dora-metrics¶
dora_metrics¶
dot¶
drill¶
- Alerting Rules Drills
- CI/CD Drill Answers
- CI/CD Drills
- Cloud Deep Dive Drills
- Cloud Ops Drills
- Container Runtime Drills
- Database Ops Drills
- Datacenter Drills
- Docker Drills
- Drill: Advanced Stash Usage
- Drill: Analyze Network Path Quality with mtr
- Drill: Basic Port Scanning with nmap
- Drill: Build an Event Timeline for Debugging
- Drill: Capture and Filter Traffic with tcpdump
- Drill: Check Resource Quotas and Limit Ranges
- Drill: Cherry-Pick Specific Commits Between Branches
- Drill: Code Search with ripgrep (rg)
- Drill: Create a Systemd Drop-in Override
- Drill: Debug DNS with dig
- Drill: Debug Network Interfaces with ethtool
- Drill: Debug a Pod Stuck in Pending State
- Drill: Effective kubectl get Output Formats
- Drill: Examine TCP Connection States with ss
- Drill: Explore Systemd Dependency Tree
- Drill: Explore the /proc Filesystem for a Process
- Drill: Filter Journal Entries by Time, Unit, and Priority
- Drill: Find Open Files with lsof
- Drill: Find What Is Consuming Disk Space
- Drill: Find a Regression with git bisect
- Drill: Get Logs from Multi-Container Pods
- Drill: HTTP Debugging with curl
- Drill: Inspect Cgroup Resource Limits and Usage
- Drill: Inspect and Decode ConfigMaps and Secrets
- Drill: Inspect and Manage the ARP Table
- Drill: Interactive Rebase to Clean Up Commits
- Drill: Manage Network Connections with nmcli
- Drill: Read and Analyze Saved pcap Files
- Drill: Read and Manipulate the Routing Table
- Drill: Recover Lost Commits with Reflog
- Drill: Safely Drain a Kubernetes Node
- Drill: Trace Process Relationships
- Drill: Trace Syscalls with strace
- Drill: Understanding git diff Variants
- Drill: Use Port-Forward to Test Services and Pods
- Drill: Work on Multiple Branches with git worktree
- Drill: fd as a Modern find Replacement
- Drill: fzf Integration Patterns
- Drill: fzf Interactive Selection
- Drill: jq Recipes for JSON Processing
- Drill: tmux Session Management
- Drills
- Drills
- FinOps Drills
- Git Drills
- GitOps & ArgoCD Drills
- Helm Drill Answers
- Helm Drills
- Kubernetes Operators Drills
- Linux Ops Drills
- LogQL Drills
- Modern CLI Drills
- Networking Drills
- Observability Drill Answers
- Observability Drills
- Policy Engine Drills
- Postmortem & SLO Drills
- PromQL Drills
- Python Drills
- Secrets Management Drills
- Security Drills
- Service Mesh Drills
- TLS & PKI Drills
- Terraform Drills
- etcd Drills
- kubectl Drill Answers
- kubectl Drills
dynamodb¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
ebpf¶
- Anti-Primer: eBPF Observability
- Continuous Profiling Footguns
- Continuous Profiling — Street-Level Ops
- Linux Performance Tuning - Street-Level Ops
- Linux Performance Tuning Footguns
- Primer
- Primer
- Primer
- Primer
- Runtime Security with Falco Footguns
- Runtime Security with Falco — Street-Level Ops
- eBPF & Modern Linux Observability
- eBPF & Modern Linux Observability - Street-Level Ops
- eBPF & Modern Linux Observability Footguns
ec2¶
ecosystem¶
edge¶
edge-iot¶
edge_iot¶
elasticsearch¶
email-infrastructure¶
- Email Infrastructure Footguns
- Email Infrastructure — Street-Level Ops
- Email Infrastructure — Trivia & Interesting Facts
- Primer
email_infrastructure¶
environment_variables¶
- Anti-Primer: Environment Variables
- Environment Variables
- Environment Variables - Street-Level Ops
- Environment Variables Footguns
envoy¶
- Anti-Primer: Envoy
- Diagnostic Questions
- Envoy Proxy
- Footguns
- Grading Rubric
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- Primer
- Primer
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Street ops
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
etcd¶
- Anti-Primer: Etcd
- Interview: etcd Space Exceeded
- Runbook: etcd Backup & Restore
- Scenario: etcd Troubleshooting
- Skillcheck: etcd
- etcd
- etcd - Street-Level Ops
- etcd Drills
- etcd Footguns
event-streaming¶
evolution¶
- Evolution Guides — "How We Got Here"
- How We Got Here: Application Architecture
- How We Got Here: Artifact Management
- How We Got Here: CI/CD Evolution
- How We Got Here: Container Evolution
- How We Got Here: From Bare Metal to Serverless
- How We Got Here: Incident Management
- How We Got Here: Kubernetes Itself
- How We Got Here: Logging Evolution
- How We Got Here: Monitoring Evolution
- How We Got Here: Service Communication
falco¶
- Anti-Primer: Falco
- Primer
- Runtime Security with Falco
- Runtime Security with Falco Footguns
- Runtime Security with Falco — Street-Level Ops
fd¶
- Anti-Primer: fd
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Primer
- Skillcheck: Modern CLI Tools
- fd
- fd - Street-Level Ops
- fd Footguns
feature-flags¶
feature_flags¶
filesystem¶
filesystems¶
- Inodes
- Inodes — Trivia & Interesting Facts
- Linux Ops Storage
- Mounts & Filesystems — Trivia & Interesting Facts
find_command¶
finops¶
- Anti-Primer: Finops
- Cost Optimization & FinOps - Street-Level Ops
- FinOps & Cost Optimization
- FinOps Drills
- FinOps Footguns
- FinOps — Trivia & Interesting Facts
- Interview: Cost Spike Investigation
- Primer
- Skillcheck: FinOps
firewall¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Primer
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
firewalls¶
- Anti-Primer: Firewalls
- Firewall Footguns
- Firewalls
- Firewalls - Street-Level Ops
- Primer
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
firmware¶
- Anti-Primer: Firmware
- Case Study: PXE Boot Fails UEFI Mismatch
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Footguns
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Diagnostic Questions
- Firmware
- Firmware & BIOS - Street-Level Ops
- Firmware & BIOS Footguns
- Grading Checklist: PXE Boot Fails - UEFI Mismatch
- Grading Rubric
- Interview: Server Won't POST
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Primer
- Primer
- Questions: PXE Boot Fails - UEFI Mismatch
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Scenario: Server Won't Boot After Update
- Solution: PXE Boot Fails - UEFI Mismatch
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: PXE Boot Fails - UEFI Mismatch
flamegraph¶
- Continuous Profiling Footguns
- Continuous Profiling — Street-Level Ops
- Primer
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
flashcards¶
- Knowledge Compendiums
- Shuffled Trivia Compendium
- Trivia compendium
- Trivia compendium
- Trivia compendium
fleet-ops¶
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Fleet Ops — Trivia & Interesting Facts
- Primer
fleet_ops¶
floating-point¶
footguns¶
forensics¶
- Anti-Primer: Infra Forensics
- Infrastructure Forensics
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- Primer
frontend¶
frontend-debugging¶
fundamentals¶
fzf¶
- Anti-Primer: fzf
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Primer
- Skillcheck: Modern CLI Tools
- fzf
- fzf - Street-Level Ops
- fzf Footguns
gcp¶
gcp_troubleshooting¶
- Anti-Primer: GCP Troubleshooting
- GCP Troubleshooting
- GCP Troubleshooting - Street-Level Ops
- GCP Troubleshooting Footguns
git¶
- Anti-Primer: Git
- Git Drills
- Git Footguns
- Git Workflows & Branching Strategies
- Git for DevOps
- Git for DevOps Engineers - Street Ops
- Interview: Secret Leaked to Git
- Primer
- Skillcheck: Git
- Thinking Out Loud: Git
- Track: Foundations
git_advanced¶
git_workflows¶
github¶
github-actions¶
github_actions¶
gitops¶
- Argo CD & GitOps — Trivia & Interesting Facts
- ArgoCD & GitOps Footguns
- ArgoCD & GitOps — Street-Level Ops
- GitOps & ArgoCD Drills
- GitOps — Trivia & Interesting Facts
- Interview: Config Drift Detected
- Interview: GitOps Drift Detected
- Primer
- Runbook: ArgoCD Out of Sync
- Skillcheck: GitOps
- Track: Helm & Release Ops
grafana¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Observability Deep Dive - Street Ops
- Observability Footguns
- Primer
- Primer
- Primer
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Skillcheck: Observability
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Track: Observability
graphql¶
grep_and_regex¶
grep_regex¶
grpc¶
guide¶
- Bare-Metal Provisioning
- Cluster Management Guide
- Cluster Upgrade Exercise
- Dell Server Management
- Gitops Example
- Mental-Model-First Learning Guide
- Modern Cli Tools
- Rack & Data Center Operations
- Security Scanning
- Troubleshooting
hardware¶
hardware-security¶
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Primer
- Primer
- Redfish -- Footguns
- Redfish -- Street Ops
hashicorp-vault¶
hashicorp_vault¶
health-checks¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
helm¶
- Anti-Primer: Helm
- Diagnostic Questions
- Grading Rubric
- Helm
- Helm - Street-Level Ops
- Helm Drills
- Helm Footguns
- Helm — Trivia & Interesting Facts
- Interview: Helm Upgrade Broke Prod
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Runbook: Helm Upgrade Failed
- Skillcheck: Helm & Release Ops
- Skillcheck: Kubernetes
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Thinking Out Loud: Helm
- Track: Helm & Release Ops
homelab¶
- Anti-Primer: Homelab
- Homelab & Learning Infrastructure
- Homelab & Learning Infrastructure - Street-Level Ops
- Homelab Footguns
- Primer
hpa¶
- Diagnostic Questions
- Grading Rubric
- Interview: HPA Not Scaling
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Ops Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Runbook: HPA Not Scaling
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
http¶
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- Primer
- Primer
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
http_protocol¶
human-factors¶
iac¶
- Ansible — Trivia & Interesting Facts
- Crossplane — Trivia & Interesting Facts
- Nix — Trivia & Interesting Facts
- OpenTofu — Trivia & Interesting Facts
- Packer — Trivia & Interesting Facts
- Primer
- Terraform Deep Dive - Footguns
- Terraform Deep Dive - Street Ops
iam¶
imagepull¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
incident-command¶
incident-psychology¶
- Incident Psychology — Trivia & Interesting Facts
- Primer
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
incident-response¶
- Change Management - Street-Level Ops
- Change Management Footguns
- Debugging Methodology - Street-Level Ops
- Debugging Methodology Footguns
- Incident Command & On-Call - Street-Level Ops
- Incident Command & On-Call Footguns
- Incident Postmortem & SLO/SLI - Street-Level Ops
- Ops War Stories & Pattern Recognition - Street-Level Ops
- Ops War Stories & Pattern Recognition Footguns
- Postmortem & SLO Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- Systems Thinking Footguns
- Systems Thinking for Engineers - Street-Level Ops
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
- Track: Incident Response
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
incident-triage¶
incident_psychology¶
incident_response¶
- Anti-Primer: Chaos Engineering
- Chaos Engineering & Fault Injection
- Interview Gauntlet: API Returning 503s
- Interview Gauntlet: Alerts Firing but System Seems Fine
- Interview Gauntlet: Customer Reports Data Inconsistency
- Interview Gauntlet: Deploy Succeeded but Old Version Visible
- Interview Gauntlet: Disk Usage on Prod Database
- Interview Gauntlet: Pods Crash-Looping
- Lab 15: Incident Response
- Lab 16: Chaos Engineering
incident_triage¶
- Anti-Primer: Incident Triage
- Decision Tree: Alert Fired — Is This Real?
- Decision Tree: Deployment Is Stuck
- Decision Tree: Disk Is Filling Up
- Decision Tree: Latency Has Increased
- Decision Tree: Memory Usage Is High
- Decision Tree: Node Is NotReady
- Decision Tree: Pod Won't Start
- Decision Tree: Service Returning 5xx Errors
- Incident Triage
- Incident Triage - Street-Level Ops
- Incident Triage Footguns
- Thinking Out Loud: Incident Triage
infra-testing¶
infra_testing¶
infrastructure¶
- API Gateways — Trivia & Interesting Facts
- Bare Metal Provisioning — Trivia & Interesting Facts
- Capacity Planning — Trivia & Interesting Facts
- Ceph — Trivia & Interesting Facts
- Cloud Deep Dive — Trivia & Interesting Facts
- DNS Deep Dive - Footguns
- DNS Deep Dive - Street-Level Ops
- Disaster Recovery — Trivia & Interesting Facts
- Edge & IoT — Trivia & Interesting Facts
- Firmware — Trivia & Interesting Facts
- HashiCorp Vault — Trivia & Interesting Facts
- Homelab — Trivia & Interesting Facts
- Infrastructure Forensics — Trivia & Interesting Facts
- Infrastructure Testing — Trivia & Interesting Facts
- OpenTelemetry — Trivia & Interesting Facts
- Packer — Trivia & Interesting Facts
- Policy Engines — Trivia & Interesting Facts
- Primer
- Python for Infrastructure — Trivia & Interesting Facts
- etcd — Trivia & Interesting Facts
ingress¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- Primer
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
init¶
inodes¶
- Anti-Primer: Inodes
- Incident Replay: Inode Exhaustion
- Inode Footguns
- Inodes
- Inodes - Street-Level Ops
interactive¶
interview¶
- Interview Gauntlet: API Returning 503s
- Interview Gauntlet: Alerts Firing but System Seems Fine
- Interview Gauntlet: Ansible Playbook 9x Slower
- Interview Gauntlet: CI/CD for a Monorepo
- Interview Gauntlet: Container Image Build and Distribution Pipeline
- Interview Gauntlet: Container Using 2x Expected Memory
- Interview Gauntlet: Customer Reports Data Inconsistency
- Interview Gauntlet: Deploy Succeeded but Old Version Visible
- Interview Gauntlet: Disagreeing with a Technical Decision
- Interview Gauntlet: Disk Usage on Prod Database
- Interview Gauntlet: Flaky CI Build
- Interview Gauntlet: GitOps or Traditional CI/CD?
- Interview Gauntlet: Handling a Production Incident
- Interview Gauntlet: Improving Team Development Workflow
- Interview Gauntlet: Intermittent gRPC Failures
- Interview Gauntlet: Kubernetes or Simpler Orchestrator?
- Interview Gauntlet: Learning Something Quickly
- Interview Gauntlet: Log Aggregation Pipeline
- Interview Gauntlet: Managed Database or Self-Hosted?
- Interview Gauntlet: Monitoring Stack from Scratch
- Interview Gauntlet: Monolith or Microservices?
- Interview Gauntlet: Multi-Region Kubernetes Deployment
- Interview Gauntlet: Network Latency Spikes Every 30 Seconds
- Interview Gauntlet: Pods Crash-Looping
- Interview Gauntlet: Secrets Management System
- Interview Gauntlet: Should We Use a Service Mesh?
- Interview Gauntlet: Terraform Plan Shows 47 Resources to Destroy/Recreate
- Interview Gauntlet: When Automation Went Wrong
- Interview Gauntlet: Your Approach to On-Call
- Interview Gauntlet: eBPF for Observability
- Interview Scenarios
interview-prep¶
- Knowledge Compendiums
- Shuffled Trivia Compendium
- Trivia compendium
- Trivia compendium
- Trivia compendium
iot¶
ip¶
ipmi¶
ipmi-and-ipmitool¶
ipmi_and_ipmitool¶
iptables_nftables¶
iscsi¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
istio¶
- Anti-Primer: Istio
- Istio Service Mesh
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- Primer
job-queue¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
jq¶
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Primer
- Skillcheck: Modern CLI Tools
json¶
- Primer
- YAML, JSON & Config Formats - Footguns
- YAML, JSON & Config Formats - Street Ops
- jq — Trivia & Interesting Facts
k6¶
k8s¶
- API Gateways & Ingress - Street-Level Ops
- API Gateways & Ingress Footguns
- Argo Workflows Footguns
- Argo Workflows — Street-Level Ops
- Case Study: CNI Broken After Restart
- Case Study: CoreDNS Timeout Pod DNS
- Case Study: CrashLoopBackOff No Logs
- Case Study: DaemonSet Blocks Eviction
- Case Study: Drain Blocked by PDB
- Case Study: ImagePullBackOff Registry Auth
- Case Study: Node Pressure Evictions
- Case Study: Persistent Volume Stuck Terminating
- Case Study: Resource Quota Blocking Deploy
- Case Study: Service No Endpoints
- Container Base Images — Footguns & Pitfalls
- Container Base Images — Street Ops
- Container Runtime Drills
- Database Operations - Street-Level Ops
- Database Ops Drills
- Database Ops Footguns
- Deep Dive: Containers How They Really Work
- Deep Dive: Docker Image Internals
- Deep Dive: Kubernetes Networking
- Deep Dive: Kubernetes Pod Lifecycle
- Docker Drills
- Envoy Proxy — Trivia & Interesting Facts
- Footguns
- Footguns
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Incident Replay: CNI Broken After Node Restart
- Incident Replay: CoreDNS Timeout — Pod DNS Resolution Failing
- Incident Replay: CrashLoopBackOff with No Logs
- Incident Replay: DaemonSet Blocks Node Eviction
- Incident Replay: ImagePullBackOff — Registry Authentication Failure
- Incident Replay: Node Drain Blocked by PDB
- Incident Replay: Node Pressure Evictions
- Incident Replay: Persistent Volume Stuck Terminating
- Incident Replay: Resource Quota Blocking Deployment
- Incident Replay: Service Has No Endpoints
- Interview: Database Failover During Deploy
- Interview: Deployment Stuck Progressing
- Interview: Docker Container Debugging
- Interview: HPA Not Scaling
- Interview: Ingress 404
- Interview: Kyverno Blocking Deploys
- Interview: Pods OOMKilled
- Interview: RBAC Forbidden
- Interview: Service Mesh 503s
- Interview: etcd Space Exceeded
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- Istio — Trivia & Interesting Facts
- K8s Concept Chain — Footguns
- K8s Concept Chain — Street-Level Ops
- Kubernetes Concept Chain
- Kubernetes Debugging -- Street Ops
- Kubernetes Debugging Footguns
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Node Lifecycle -- Street Ops
- Kubernetes Node Lifecycle Footguns
- Kubernetes Operations Case Studies
- Kubernetes Operators Drills
- Kubernetes Ops Footguns
- Kubernetes Pods & Scheduling - Street Ops
- Kubernetes Pods & Scheduling Footguns
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Policy Engine Drills
- Policy Engine Footguns
- Policy Engines - Street-Level Ops
- Practical Kubernetes Ops - Street Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Progressive Delivery Footguns
- Progressive Delivery — Street-Level Ops
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Runbook: Disaster Recovery
- Runbook: HPA Not Scaling
- Runbook: ImagePullBackOff
- Runbook: Ingress 404
- Runbook: Istio 503 Errors
- Runbook: Kyverno Blocking Workloads
- Runbook: NetworkPolicy Block
- Runbook: Pod Eviction
- Runbook: RBAC Forbidden
- Runbook: Readiness Probe Failed
- Runbook: Velero Backup & Restore
- Runbook: etcd Backup & Restore
- Scenario: etcd Troubleshooting
- Service Mesh - Street-Level Ops
- Service Mesh Drills
- Service Mesh Footguns
- Skillcheck: Container Runtime Debug
- Skillcheck: Database Ops
- Skillcheck: Docker
- Skillcheck: Kubernetes Operators
- Skillcheck: Kubernetes Under the Covers
- Skillcheck: Policy Engines
- Skillcheck: Service Mesh
- Skillcheck: etcd
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Street ops
- Street ops
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Track: Containers
- Track: Kubernetes Core
- Trivia
- cert-manager Footguns
- cert-manager — Street-Level Ops
- etcd Drills
- kubectl Debugging Cheatsheet
- kubectl Drills
k8s-core¶
- AI/ML Ops Footguns
- Case Study: CrashLoopBackOff No Logs
- Case Study: Drain Blocked by PDB
- Case Study: Node Pressure Evictions
- Chaos Engineering & Fault Injection - Street-Level Ops
- Chaos Engineering Footguns
- Deep Dive: Kubernetes Pod Lifecycle
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Interview: Deployment Stuck Progressing
- Kubernetes Debugging -- Street Ops
- Kubernetes Debugging Footguns
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Primer
- Primer
- Primer
- Primer
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Runbook: Disaster Recovery
- Runbook: ImagePullBackOff
- Runbook: Pod Eviction
- Runbook: Readiness Probe Failed
- Runbook: Velero Backup & Restore
- Skillcheck: Kubernetes Under the Covers
- Solution
- Solution
- Solution
- Symptoms
- Symptoms
- Symptoms
- The Ops of AI/ML Workloads - Street-Level Ops
- Track: Kubernetes Core
- kubectl Debugging Cheatsheet
- kubectl Drills
k8s-debugging¶
k8s-ecosystem¶
k8s-networking¶
- API Gateways & Ingress - Street-Level Ops
- API Gateways & Ingress Footguns
- Case Study: CNI Broken After Restart
- Case Study: CoreDNS Timeout Pod DNS
- Deep Dive: Kubernetes Networking
- Grading Checklist
- Grading Checklist
- Interview: Ingress 404
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Primer
- Primer
- Primer
- Questions to Determine
- Questions to Determine
- Runbook: Ingress 404
- Runbook: NetworkPolicy Block
- Service Mesh - Street-Level Ops
- Service Mesh Footguns
- Skillcheck: Kubernetes Under the Covers
- Solution
- Solution
- Symptoms
- Symptoms
- Track: Kubernetes Core
k8s-node-lifecycle¶
k8s-operators¶
k8s-rbac¶
- Interview: RBAC Forbidden
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Policy Engine Footguns
- Policy Engines - Street-Level Ops
- Primer
- Primer
- Runbook: RBAC Forbidden
- Track: Kubernetes Core
k8s-storage¶
k8s_debugging_playbook¶
- Anti-Primer: Kubernetes Debugging Playbook
- Kubernetes Debugging Playbook
- Thinking Out Loud: Kubernetes Debugging
k8s_ecosystem¶
k8s_networking¶
- Anti-Primer: Kubernetes Networking
- K8s Networking
- Kubernetes Networking - Street-Level Ops
- Kubernetes Networking Footguns
- Thinking Out Loud: Kubernetes Networking
k8s_node_lifecycle¶
- Anti-Primer: Kubernetes Node Lifecycle
- Kubernetes Node Lifecycle
- Thinking Out Loud: Kubernetes Node Lifecycle
k8s_ops¶
k8s_pods_and_scheduling¶
- Anti-Primer: Kubernetes Pods And Scheduling
- Kubernetes Pods & Scheduling
- Thinking Out Loud: Kubernetes Pods & Scheduling
k8s_rbac¶
- Anti-Primer: Kubernetes RBAC
- K8s RBAC
- RBAC - Street-Level Ops
- RBAC Footguns
- Thinking Out Loud: Kubernetes RBAC
k8s_services_and_ingress¶
- Anti-Primer: Kubernetes Services And Ingress
- Kubernetes Services & Ingress
- Thinking Out Loud: Kubernetes Services & Ingress
k8s_storage¶
- Anti-Primer: Kubernetes Storage
- K8s Storage
- Kubernetes Storage - Street-Level Ops
- Kubernetes Storage Footguns
- Thinking Out Loud: Kubernetes Storage
kafka¶
kernel¶
- Kernel Troubleshooting — Trivia & Interesting Facts
- Linux Kernel Tuning - Street-Level Ops
- Linux Kernel Tuning Footguns
- Primer
- Primer
- cgroups & Linux Namespaces - Street Ops
- cgroups & Namespaces Footguns
kernel-troubleshooting¶
kernel_troubleshooting¶
kms¶
- Diagnostic Questions
- Grading Rubric
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
knowledge-architecture¶
kubernetes¶
- Argo CD & GitOps — Trivia & Interesting Facts
- Argo Workflows — Trivia & Interesting Facts
- Cilium & eBPF Networking - Street-Level Ops
- Cilium & eBPF Networking Footguns
- CrashLoopBackOff — Trivia & Interesting Facts
- Crossplane — Trivia & Interesting Facts
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Falco — Trivia & Interesting Facts
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Helm — Trivia & Interesting Facts
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Kubernetes Debugging Playbook — Trivia & Interesting Facts
- Kubernetes Ecosystem - Street-Level Ops
- Kubernetes Ecosystem Footguns
- Kubernetes Ecosystem — Trivia & Interesting Facts
- Kubernetes Networking — Trivia & Interesting Facts
- Kubernetes Node Lifecycle — Trivia & Interesting Facts
- Kubernetes Ops — Trivia & Interesting Facts
- Kubernetes RBAC — Trivia & Interesting Facts
- Kubernetes Storage — Trivia & Interesting Facts
- Kustomize - Street-Level Ops
- Kustomize Footguns
- Kustomize — Trivia & Interesting Facts
- Multi-Tenancy — Trivia & Interesting Facts
- OOMKilled — Trivia & Interesting Facts
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- cert-manager — Trivia & Interesting Facts
- etcd — Trivia & Interesting Facts
kustomize¶
l1¶
- Email Infrastructure Footguns
- Email Infrastructure — Street-Level Ops
- Primer
- Primer
- cert-manager Footguns
- cert-manager — Street-Level Ops
l2¶
- BGP EVPN / VXLAN Footguns
- BGP EVPN / VXLAN — Street-Level Ops
- DNSSEC & DNS Security Footguns
- DNSSEC & DNS Security — Street-Level Ops
- Network Automation Footguns
- Network Automation — Street-Level Ops
- Primer
- Primer
- Primer
- Primer
- Runtime Security with Falco Footguns
- Runtime Security with Falco — Street-Level Ops
l7-proxy¶
lab¶
- Hands-On Labs
- Lab 10: RBAC & Security
- Lab 11: Monitoring Stack
- Lab 12: CI/CD Pipeline
- Lab 13: Terraform IaC
- Lab 14: Log Analysis
- Lab 15: Incident Response
- Lab 16: Chaos Engineering
- Lab 17: Performance Tuning
- Lab 18: Zero-Downtime Migration
- Lab 19: Multi-Cluster
- Lab 1: Linux Triage
- Lab 20: Platform Engineering
- Lab 21: Production Readiness Review
- Lab 22: Incident Simulation
- Lab 23: Architecture Review
- Lab 24: On-Call Shift
- Lab 25: Tech Lead Challenge
- Lab 2: Container Basics
- Lab 3: Networking Fundamentals
- Lab 4: Git Operations
- Lab 5: Shell Scripting
- Lab 6: Deploy & Scale
- Lab 7: Pod Debugging
- Lab 8: Service Networking
- Lab 9: Storage & State
- Solution: Lab Runtime 01 -- Readiness Probe Failure
- Solution: Lab Runtime 02 -- HPA Live Scaling
- Solution: Lab Runtime 03 -- Observability Target Down
- Solution: Lab Runtime 04 -- Loki No Logs
- Solution: Lab Runtime 05 -- Helm Upgrade Rollback
- Solution: Lab Runtime 06 -- Trivy Fail to Green
- Solution: Lab Runtime 07 -- GitOps Sync and Drift
- Solution: Lab Runtime 08 -- Resource Limits OOM
lacp¶
- Anti-Primer: LACP
- Case Study: Bonding Failover Not Working
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Grading Checklist
- LACP
- LACP - Street-Level Ops
- LACP Footguns
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Footguns
- Primer
- Primer
- Questions to Determine
- Scenario: NIC Flapping / LACP Mismatch
- Solution: Bonding Failover Not Working
- Symptoms: Network Bonding Failover Not Working
lambda¶
latency¶
- Diagnostic Questions
- Grading Rubric
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
launchdarkly¶
ldap¶
ldap_identity¶
learning¶
learning-method¶
learning_paths¶
least-privilege¶
legacy-archaeology¶
legacy-systems¶
legacy_systems¶
lesson¶
- Ansible - The Complete Guide (Revised, Current, Production-Focused)
- Ansible one screen interview quick ref
- Linux - Foundations and Operations Guide
- Linux interview quick ref
- Overview
- Python for Infrastructure Automation
- Python.for.infrastructure.interview.one.screen
lets-encrypt¶
lfcs¶
library¶
linux¶
- Advanced Bash Footguns
- Advanced Bash for Ops - Street-Level Ops
- Advanced Bash — Trivia & Interesting Facts
- Binary and Floating Point Footguns
- Binary and Floating Point — Street-Level Ops
- Binary and Floats
- Case Study: IPTables Blocking Unexpected
- Case Study: Inode Exhaustion
- Case Study: Kernel Soft Lockup
- Case Study: OOM Killer Events
- Case Study: Runaway Logs Fill Disk
- Case Study: SELinux Denying Service
- Case Study: Stuck NFS Mount
- Case Study: Systemd Service Flapping
- Case Study: Time Sync Skew Breaks App
- Case Study: Zombie Processes Accumulating
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- Cron Scheduling — Trivia & Interesting Facts
- DNF Package Manager
- Debian & Ubuntu — Footguns & Pitfalls
- Debian & Ubuntu — Street Ops
- Debian & Ubuntu — Trivia & Interesting Facts
- Deep Dive: Linux Boot Sequence
- Deep Dive: Linux Memory Management
- Deep Dive: Linux Performance Debugging
- Deep Dive: Systemd Architecture
- Disk & Storage Ops
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Incident Replay: Inode Exhaustion
- Incident Replay: Kernel Soft Lockup
- Incident Replay: OOM Killer Events
- Incident Replay: Runaway Logs Fill Disk
- Incident Replay: SELinux Denying Service
- Incident Replay: Stuck NFS Mount
- Incident Replay: Time Sync Skew Breaks Application
- Incident Replay: Zombie Processes Accumulating
- Incident Replay: iptables Blocking Unexpected Traffic
- Incident Replay: systemd Service Flapping
- Inodes
- Inodes — Trivia & Interesting Facts
- Interview: Linux Server Slow
- Kernel Troubleshooting - Street-Level Ops
- Kernel Troubleshooting Footguns
- Kernel Troubleshooting — Trivia & Interesting Facts
- LPIC & LFCS — Trivia & Interesting Facts
- LPIC / LFCS — Footguns & Pitfalls
- LPIC / LFCS — Street Ops
- Linux Boot Process
- Linux Boot Process — Footguns & Pitfalls
- Linux Boot Process — Street Ops
- Linux Boot Process — Trivia & Interesting Facts
- Linux Data Hoarding
- Linux Deep Triage
- Linux Distribution Comparison — Footguns & Pitfalls
- Linux Distribution Comparison — Street Ops
- Linux Distro Comparison — Trivia & Interesting Facts
- Linux Hardening — Trivia & Interesting Facts
- Linux Kernel Tuning - Street-Level Ops
- Linux Kernel Tuning Footguns
- Linux Logging
- Linux Logging — Footguns
- Linux Logging — Street Ops
- Linux Logging — Trivia & Interesting Facts
- Linux Memory Management
- Linux Memory Management — Footguns
- Linux Memory Management — Street Ops
- Linux Memory Management — Trivia & Interesting Facts
- Linux Operations Case Studies
- Linux Ops Drills
- Linux Ops Footguns
- Linux Ops Storage
- Linux Ops Storage — Trivia & Interesting Facts
- Linux Ops Systemd
- Linux Ops — Trivia & Interesting Facts
- Linux Ops — systemd — Trivia & Interesting Facts
- Linux Performance Tuning - Street-Level Ops
- Linux Performance Tuning Footguns
- Linux Performance — Trivia & Interesting Facts
- Linux Signals & Process Control - Footguns
- Linux Signals & Process Control - Street-Level Ops
- Linux System Administration - Street Ops
- Linux Text Processing
- Linux Text Processing - Street-Level Ops
- Linux Text Processing Footguns
- Linux Text Processing — Trivia & History
- Linux Users & Permissions
- Linux Users and Permissions — Footguns & Pitfalls
- Linux Users and Permissions — Street Ops
- Linux Users and Permissions — Trivia & Interesting Facts
- Modern Cli Workflows
- Mounts & Filesystems — Trivia & Interesting Facts
- OOMKilled — Trivia & Interesting Facts
- Package Management — Trivia & Interesting Facts
- Performance Profiling — Trivia & Interesting Facts
- Pipes & Redirection - Footguns
- Pipes & Redirection - Street-Level Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process Management - Street-Level Ops
- Process Management Footguns
- Process Management — Trivia & Interesting Facts
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- RHCE — Trivia & Interesting Facts
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- SSH Deep Dive — Trivia & Interesting Facts
- Skillcheck: Bash
- Skillcheck: Linux Fundamentals
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Solution
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Terminal Internals
- Terminal Internals - Street Ops
- Terminal Internals Footguns
- Track: Foundations
- Trivia compendium
- awk — Footguns
- awk — Street-Level Ops
- awk — Trivia & Interesting Facts
- awk: The Record/Field Processor
- cgroups & Linux Namespaces - Street Ops
- cgroups & Namespaces Footguns
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
- eBPF & Modern Linux Observability - Street-Level Ops
- eBPF & Modern Linux Observability Footguns
- find - Footguns & Pitfalls
- find - Street-Level Ops
- find — Trivia & Interesting Facts
- grep & Regular Expressions
- grep & Regular Expressions - Footguns
- grep & Regular Expressions - Street-Level Ops
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
- iptables & nftables — Trivia & Interesting Facts
- perf Profiling
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
- rsync - Street Ops
- rsync Footguns
- rsync — Trivia & Interesting Facts
- sed — Footguns
- sed — Street-Level Ops
- sed — Trivia & Interesting Facts
- sed: The Stream Editor
- strace Footguns
- strace — Street-Level Ops
- systemctl & journalctl Footguns
- systemctl & journalctl Street Ops
- tar & Compression - Footguns
- tar & Compression - Street-Level Ops
- tmux & screen
- xargs - Footguns & Pitfalls
- xargs - Street Ops
- xargs — Trivia & Interesting Facts
linux-fundamentals¶
- Advanced Bash Footguns
- Advanced Bash for Ops - Street-Level Ops
- Case Study: Inode Exhaustion
- Case Study: OOM Killer Events
- Case Study: SELinux Denying Service
- Case Study: Time Sync Skew Breaks App
- Deep Dive: Linux Boot Sequence
- Deep Dive: Linux Memory Management
- Deep Dive: Linux Performance Debugging
- Edge & IoT Infrastructure - Street-Level Ops
- Edge & IoT Infrastructure Footguns
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Interview: Linux Server Slow
- Kernel Troubleshooting - Street-Level Ops
- Kernel Troubleshooting Footguns
- Linux Deep Triage
- Linux Ops Drills
- Linux Ops Footguns
- Linux Performance Tuning - Street-Level Ops
- Linux Performance Tuning Footguns
- Linux System Administration - Street Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process Management - Street-Level Ops
- Process Management Footguns
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- SELinux & AppArmor - Street-Level Ops
- SELinux & AppArmor Footguns
- Skillcheck: Linux Fundamentals
- Solution
- Solution
- Solution
- Solution
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Track: Foundations
- Virtualization - Street-Level Ops
- Virtualization Footguns
- eBPF & Modern Linux Observability - Street-Level Ops
- eBPF & Modern Linux Observability Footguns
linux-hardening¶
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- LDAP & Identity Management - Street-Level Ops
- LDAP & Identity Management Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- SELinux & AppArmor - Street-Level Ops
- SELinux & AppArmor Footguns
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
linux-networking¶
- Deep Dive: Linux Network Packet Flow
- Networking Drills
- Primer
- Scenario: Asymmetric Routing
- Scenario: Duplex Mismatch
- Scenario: MTU Blackhole
- Scenario: NIC Flapping / LACP Mismatch
- Scenario: OOB Unreachable but Host Responds
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
linux-ops¶
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
linux-performance¶
linux_boot_process¶
linux_data_hoarding¶
linux_distro_comparison¶
linux_fundamentals¶
linux_hardening¶
linux_kernel_tuning¶
linux_logging¶
linux_memory_management¶
linux_ops¶
linux_ops_storage¶
linux_ops_systemd¶
- Anti-Primer: Linux Ops Systemd
- Thinking Out Loud: Linux Ops — systemd
- systemd Footguns
- systemd Street Ops
linux_performance¶
linux_signals_and_process_control¶
linux_text_processing¶
linux_users_and_permissions¶
load-balancing¶
- API Gateways & Ingress - Street-Level Ops
- API Gateways & Ingress Footguns
- HAProxy & Nginx Load Balancing Footguns
- HAProxy & Nginx for Ops - Street-Level Ops
- Nginx & Web Servers - Street-Level Ops
- Nginx & Web Servers Footguns
- Primer
- Primer
- Primer
load-testing¶
- Load Testing Footguns
- Load Testing — Street-Level Ops
- Load Testing — Trivia & Interesting Facts
- Primer
load_balancing¶
load_testing¶
log-pipelines¶
log_pipelines¶
logging¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Linux Logging
- Linux Logging — Footguns
- Linux Logging — Street Ops
- Log Pipelines - Street-Level Ops
- Log Pipelines Footguns
- Primer
- Primer
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
loki¶
- Diagnostic Questions
- Grading Rubric
- Interview: Loki Logs Disappeared
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Log Pipelines - Street-Level Ops
- Log Pipelines Footguns
- LogQL Drills
- Observability Deep Dive - Street Ops
- Observability Drills
- Observability Footguns
- Primer
- Primer
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Runbook: Loki No Logs
- Skillcheck: Observability
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Track: Observability
lpic¶
lpic_lfcs¶
lvm¶
make_and_build_systems¶
mellanox¶
mellanox_switches¶
memory-management¶
- Linux Memory Management
- Linux Memory Management — Footguns
- Linux Memory Management — Street Ops
- Primer
mental-model¶
- Architecture & Design Models
- Debugging & Diagnosis Models
- Human Factors Models
- Mental Model Library
- Mental Model: 12-Factor App
- Mental Model: Alert Fatigue
- Mental Model: Amdahl's Law
- Mental Model: Automation Complacency
- Mental Model: Bisect
- Mental Model: Blameless Postmortem
- Mental Model: Blast Radius
- Mental Model: Bulkhead
- Mental Model: CAP Theorem
- Mental Model: Cattle vs Pets
- Mental Model: Circuit Breaker
- Mental Model: Correlation vs Causation
- Mental Model: Differential Diagnosis
- Mental Model: Error Budget
- Mental Model: Event Sourcing
- Mental Model: Failure Domains
- Mental Model: Five Whys
- Mental Model: Graceful Degradation
- Mental Model: Hindsight Bias
- Mental Model: Idempotency
- Mental Model: Immutable Infrastructure
- Mental Model: Little's Law
- Mental Model: Normalization of Deviance
- Mental Model: OODA Loop
- Mental Model: PACELC
- Mental Model: Queueing Theory
- Mental Model: RED Method
- Mental Model: Runbook-Driven Recovery
- Mental Model: Shift Left
- Mental Model: Sidecar Pattern
- Mental Model: Strangler Fig
- Mental Model: Swiss Cheese Model
- Mental Model: Toil vs Automation ROI
- Mental Model: USE Method
- Operational Reasoning Models
- System Behavior Models
mental-models¶
- Ansible: idempotence + modules vs plugins vs collections
- Ansible: inventory — hosts, groups, vars, targeting
- Ansible: playbook vs play vs task vs role vs handler
- Ansible: variable precedence
- Btrfs: subvolume, snapshot, reflink, CoW
- CI/CD as a System
- Container vs VM
- DNS: Stub Resolver vs Recursive Resolver vs Authoritative Server
- Deployment vs ReplicaSet vs Pod
- File vs inode vs pathname vs symlink
- Git: commit vs branch vs tag vs HEAD
- Git: rebase vs merge
- Git: working tree vs index vs repository
- Image vs Container
- Kubernetes Control Plane as Reconciliation Engine
- Linux: kernel vs userspace vs distro
- Logs vs Metrics vs Traces
- Mental Models (Core Concepts)
- Permissions: mode bits vs ownership vs ACLs vs capabilities
- Persistent Volume vs Persistent Volume Claim
- Pod vs Container (Kubernetes)
- Process vs program vs service
- RAID vs Backup vs Snapshot
- Reverse Proxy vs Load Balancer
- Service vs Ingress (Kubernetes Networking)
- Storage Stack: Disk, Partition, LVM, Filesystem, Mount
- Systemd Units: Unit, Service, Target, Start vs Enable
- Terraform: Desired State Engine
mergerfs¶
message-queues¶
message_queues¶
- Anti-Primer: Kafka
- Anti-Primer: Message Queues
- Anti-Primer: Rabbitmq
- Kafka
- Kafka - Street-Level Ops
- Kafka Footguns
- Message Queues
- RabbitMQ & Message Queues
messaging¶
migrations¶
ml-ops¶
mlops¶
modern-cli¶
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Primer
- Skillcheck: Modern CLI Tools
modern-cli-workflows¶
modern_cli¶
modern_cli_workflows¶
mongodb¶
mongodb_ops¶
monitoring-fundamentals¶
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Primer
- Primer
monitoring-migration¶
monitoring_fundamentals¶
monitoring_migration¶
mounts_filesystems¶
- Anti-Primer: Mounts Filesystems
- Mounts & Filesystems - Street-Level Ops
- Mounts & Filesystems Footguns
- Mounts Filesystems
mtu¶
- Anti-Primer: MTU
- Case Study: MTU Blackhole TLS Stalls
- Diagnostic Questions
- Grading Checklist
- Grading Rubric
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- MTU
- MTU - Street-Level Ops
- MTU Footguns
- Questions to Determine
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Scenario: MTU Blackhole
- Solution: MTU Black Hole / TLS Stalls
- Symptoms: MTU Black Hole / TLS Stalls
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
multi-tenancy¶
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Multi-Tenancy — Trivia & Interesting Facts
- Primer
multi_tenancy¶
mysql¶
mysql_ops¶
namespaces¶
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Primer
napalm¶
nat¶
navigation¶
netconf¶
netmiko¶
network-automation¶
- Network Automation Footguns
- Network Automation — Street-Level Ops
- Network Automation — Trivia & Interesting Facts
- Primer
network_automation¶
networking¶
- API Gateways — Trivia & Interesting Facts
- ARP — Trivia & Interesting Facts
- AWS Networking
- AWS Networking - Street-Level Ops
- AWS Networking Footguns
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- Anti-Primer: Networking
- BGP EVPN / VXLAN Footguns
- BGP EVPN / VXLAN — Street-Level Ops
- BGP EVPN VXLAN — Trivia & Interesting Facts
- Case Study: ARP Flux Duplicate IP
- Case Study: Asymmetric Routing One Direction
- Case Study: BGP Peer Flapping
- Case Study: DHCP Relay Broken
- Case Study: DNS Resolution Slow
- Case Study: DNS Split Horizon Confusion
- Case Study: Duplex Mismatch Symptoms
- Case Study: Firewall Shadow Rule
- Case Study: Jumbo Frames Partial
- Case Study: LACP Mismatch One Link Hot
- Case Study: MTU Blackhole TLS Stalls
- Case Study: Multicast Not Crossing Router
- Case Study: NAT Exhaustion Intermittent
- Case Study: Network Loop Broadcast Storm
- Case Study: OSPF Stuck In Exstart
- Case Study: Proxy ARP Causing Issues
- Case Study: SSL Cert Chain Incomplete
- Case Study: Source Routing Policy Miss
- Case Study: TCP RST After Idle
- Case Study: VLAN Trunk Mistag
- Cilium — Trivia & Interesting Facts
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Cisco Fundamentals for DevOps — Trivia & Interesting Facts
- Comparison: CNI Plugins
- Comparison: Ingress Controllers
- Comparison: Service Meshes
- DHCP & IP Address Management - Street-Level Ops
- DHCP & IP Address Management Footguns
- DHCP & IPAM — Trivia & Interesting Facts
- DNS Deep Dive - Footguns
- DNS Deep Dive - Street-Level Ops
- DNS Operations - Street-Level Ops
- DNS Operations Footguns
- DNS Operations — Trivia & Interesting Facts
- DNSSEC & DNS Security Footguns
- DNSSEC & DNS Security — Street-Level Ops
- DNSSEC — Trivia & Interesting Facts
- Deep Dive: Linux Network Packet Flow
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Email Infrastructure Footguns
- Email Infrastructure — Street-Level Ops
- Firewalls — Trivia & Interesting Facts
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: DHCP Not Working on Remote VLAN
- Grading Checklist: DNS Resolution Taking 5+ Seconds Intermittently
- Grading Checklist: Jumbo Frames Enabled But Some Paths Failing
- Grading Checklist: Multicast Traffic Not Crossing Router
- Grading Checklist: Network Experiencing Broadcast Storm and High CPU on Switches
- Grading Checklist: OSPF Adjacency Stuck in ExStart/Exchange State
- Grading Checklist: Proxy ARP Causing Unexpected Routing Behavior
- Grading Checklist: TCP Connections Reset After Idle Period
- Grading Checklist: TLS Works From Some Clients But Fails From Others
- Grading Checklist: Traffic From Specific Source Not Taking Expected Path
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- HAProxy & Nginx Load Balancing Footguns
- HAProxy & Nginx for Ops - Street-Level Ops
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- HTTP Protocol — Trivia & Interesting Facts
- Incident Replay: ARP Flux — Duplicate IP Detection
- Incident Replay: Asymmetric Routing — Traffic Works One Direction Only
- Incident Replay: BGP Peer Flapping
- Incident Replay: DHCP Relay Broken
- Incident Replay: DNS Resolution Slow
- Incident Replay: DNS Split-Horizon Confusion
- Incident Replay: Duplex Mismatch Symptoms
- Incident Replay: Firewall Shadow Rule
- Incident Replay: Jumbo Frames Partial Deployment
- Incident Replay: LACP Mismatch — One Link Hot
- Incident Replay: MTU Blackhole — TLS Stalls
- Incident Replay: Multicast Not Crossing Router
- Incident Replay: NAT Exhaustion — Intermittent Connectivity
- Incident Replay: Network Loop — Broadcast Storm
- Incident Replay: OSPF Stuck in ExStart
- Incident Replay: Proxy ARP Causing Issues
- Incident Replay: SSL Certificate Chain Incomplete
- Incident Replay: Source Routing Policy Miss
- Incident Replay: TCP RST After Idle
- Incident Replay: VLAN Trunk Mistag
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Investigation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Kubernetes Networking — Trivia & Interesting Facts
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- LACP — Trivia & Interesting Facts
- Load Balancing — Trivia & Interesting Facts
- MTU — Trivia & Interesting Facts
- Mellanox Switches
- NAT — Trivia & Interesting Facts
- Network Automation Footguns
- Network Automation — Street-Level Ops
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Case Studies
- Networking Deep Dive
- Networking Drills
- Networking Footguns
- Networking Troubleshooting — Trivia & Interesting Facts
- Networking — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: DHCP Not Working on Remote VLAN
- Questions: DNS Resolution Taking 5+ Seconds Intermittently
- Questions: Jumbo Frames Enabled But Some Paths Failing
- Questions: Multicast Traffic Not Crossing Router
- Questions: Network Experiencing Broadcast Storm and High CPU on Switches
- Questions: OSPF Adjacency Stuck in ExStart/Exchange State
- Questions: Proxy ARP Causing Unexpected Routing Behavior
- Questions: TCP Connections Reset After Idle Period
- Questions: TLS Works From Some Clients But Fails From Others
- Questions: Traffic From Specific Source Not Taking Expected Path
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Routing — Trivia & Interesting Facts
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- STP — Trivia & Interesting Facts
- Scenario: Asymmetric Routing
- Scenario: DNS Looks Fine but App Fails
- Scenario: Duplex Mismatch
- Scenario: MTU Blackhole
- Scenario: VLAN Trunk Mismatch
- Service Mesh — Trivia & Interesting Facts
- Skillcheck: Networking Fundamentals
- Solution: ARP Flux / Duplicate IP
- Solution: Asymmetric Routing / One-Direction Failure
- Solution: BGP Peer Flapping
- Solution: DHCP Not Working on Remote VLAN
- Solution: DNS Resolution Taking 5+ Seconds Intermittently
- Solution: DNS Split-Horizon Confusion
- Solution: Duplex Mismatch
- Solution: Firewall Shadow Rule
- Solution: Jumbo Frames Enabled But Some Paths Failing
- Solution: LACP Mismatch / One Link Hot
- Solution: MTU Black Hole / TLS Stalls
- Solution: Multicast Traffic Not Crossing Router
- Solution: NAT Port Exhaustion / Intermittent Failures
- Solution: Network Experiencing Broadcast Storm and High CPU on Switches
- Solution: OSPF Adjacency Stuck in ExStart/Exchange State
- Solution: Proxy ARP Causing Unexpected Routing Behavior
- Solution: TCP Connections Reset After Idle Period
- Solution: TLS Works From Some Clients But Fails From Others
- Solution: Traffic From Specific Source Not Taking Expected Path
- Solution: VLAN Trunk Mistag
- Subnetting & IP Addressing — Trivia & Interesting Facts
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: ARP Flux / Duplicate IP
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: Asymmetric Routing / One-Direction Failure
- Symptoms: BGP Peer Flapping
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Canary Deploy Looks Healthy, Actually Routing to Wrong Backend, Ingress Misconfigured
- Symptoms: DHCP Not Working on Remote VLAN
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: DNS Resolution Taking 5+ Seconds Intermittently
- Symptoms: DNS Split-Horizon Confusion
- Symptoms: Duplex Mismatch
- Symptoms: Firewall Shadow Rule
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: Jumbo Frames Enabled But Some Paths Failing
- Symptoms: LACP Mismatch / One Link Hot
- Symptoms: MTU Black Hole / TLS Stalls
- Symptoms: Multicast Traffic Not Crossing Router
- Symptoms: NAT Port Exhaustion / Intermittent Failures
- Symptoms: Network Experiencing Broadcast Storm and High CPU on Switches
- Symptoms: OSPF Adjacency Stuck in ExStart/Exchange State
- Symptoms: Proxy ARP Causing Unexpected Routing Behavior
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: TCP Connections Reset After Idle Period
- Symptoms: TLS Works From Some Clients But Fails From Others
- Symptoms: Traffic From Specific Source Not Taking Expected Path
- Symptoms: VLAN Trunk Mistag
- TCP/IP Deep Dive - Street-Level Ops
- TCP/IP Deep Dive Footguns
- Tailscale - Street-Level Ops
- Tailscale Footguns
- Tailscale — Trivia & Interesting Facts
- VLANs — Trivia & Interesting Facts
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
- VPN & Tunneling — Trivia & Interesting Facts
- Wireshark / tshark / tcpdump - Street-Level Ops
- Wireshark / tshark / tcpdump Footguns
- Wireshark — Trivia & Interesting Facts
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
- gRPC - Street-Level Ops
- gRPC Footguns
- gRPC — Trivia & Interesting Facts
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
- nginx Web Servers — Trivia & Interesting Facts
networking_troubleshooting¶
- Anti-Primer: Networking Troubleshooting
- Networking Troubleshooting
- Networking Troubleshooting Footguns
- Networking Troubleshooting Street Ops
- Thinking Out Loud: Networking Troubleshooting
- Tools reference
networkpolicy¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
nginx¶
nginx_web_servers¶
nix¶
node¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Remediation: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
- Symptoms: Node NotReady, NIC Firmware Bug, Fix Is Ansible Playbook
node-maintenance¶
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Node Lifecycle -- Street Ops
- Kubernetes Node Lifecycle Footguns
- Kubernetes Ops Footguns
- Node Maintenance — Trivia & Interesting Facts
- Practical Kubernetes Ops - Street Ops
- Primer
- Primer
- Skillcheck: Kubernetes Under the Covers
node_maintenance¶
- Anti-Primer: Node Maintenance
- Node Maintenance
- Node Maintenance - Street-Level Ops
- Node Maintenance Footguns
nodes¶
nornir¶
ntp¶
- Diagnostic Questions
- Grading Rubric
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
object-storage¶
observability¶
- Alerting Rules Drills
- Alerting Rules Footguns
- Alerting Rules — Trivia & Interesting Facts
- Comparison: Alerting & Paging
- Comparison: Logging Platforms
- Comparison: Metrics Platforms
- Comparison: Tracing Platforms
- Continuous Profiling Footguns
- Continuous Profiling — Street-Level Ops
- Continuous Profiling — Trivia & Interesting Facts
- DORA Metrics — Trivia & Interesting Facts
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Interview: Loki Logs Disappeared
- Interview: Prometheus Target Down
- Investigation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Investigation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Investigation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Investigation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Log Analysis & Alerting Rules - Street-Level Ops
- Log Pipelines - Street-Level Ops
- Log Pipelines Footguns
- Log Pipelines — Trivia & Interesting Facts
- LogQL Drills
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- Monitoring Fundamentals — Trivia & Interesting Facts
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Monitoring Migration — Trivia & Interesting Facts
- Observability Deep Dive - Street Ops
- Observability Deep Dive — Trivia & Interesting Facts
- Observability Drills
- Observability Footguns
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- OpenTelemetry — Trivia & Interesting Facts
- Postmortems & SLOs — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- PromQL Drills
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Remediation: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Remediation: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Runbook: Loki No Logs
- Runbook: Tempo No Traces
- SLO Tooling Footguns
- SLO Tooling — Street-Level Ops
- SLO Tooling — Trivia & Interesting Facts
- Skillcheck: Alerting Rules
- Skillcheck: Observability
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config
- Symptoms: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
- Track: Observability
- eBPF Observability — Trivia & Interesting Facts
observability_deep_dive¶
offensive_security_basics¶
- Anti-Primer: Offensive Security Basics
- Offensive Security Basics
- Offensive Security Basics — Footguns
- Offensive Security Basics — Street Ops
oidc¶
- Diagnostic Questions
- Grading Rubric
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
on-call¶
- Incident Command & On-Call - Street-Level Ops
- Incident Command & On-Call Footguns
- Primer
- Primer
- Primer
- Primer
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
on_call¶
oncall¶
- On-Call Survival Guides
- On-Call Survival: CI/CD
- On-Call Survival: Cloud/Infrastructure
- On-Call Survival: Databases (PostgreSQL)
- On-Call Survival: Kubernetes
- On-Call Survival: Linux/OS
- On-Call Survival: Networking
- On-Call Survival: Observability
- On-Call Survival: Security
oob-management¶
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Drills
- Datacenter Footguns
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- Redfish -- Footguns
- Redfish -- Street Ops
- Scenario: OOB Unreachable but Host Responds
- Scenario: Server Won't Boot After Update
- Skillcheck: Datacenter
oom¶
- Case Study: Node Pressure Evictions
- Grading Checklist
- Interview: Pods OOMKilled
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Ops Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
- Questions to Determine
- Solution
- Symptoms
oomkilled¶
- Anti-Primer: Oomkilled
- Diagnostic Questions
- Grading Rubric
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- OOMKilled
- OOMKilled - Street-Level Ops
- OOMKilled Footguns
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
opa¶
- Footguns
- Infrastructure Testing Footguns
- Infrastructure Testing — Street-Level Ops
- Primer
- Primer
- Street ops
open_policy_agent¶
openfeature¶
openslo¶
opentelemetry¶
- Anti-Primer: Opentelemetry
- OpenTelemetry
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- Primer
- Thinking Out Loud: OpenTelemetry
opentofu¶
operational¶
- Decision Tree: Certificate Is Expiring — What Do I Do?
- Decision Tree: How to Handle This Config Change?
- Decision Tree: Roll Back or Fix Forward?
- Decision Tree: Scale Up or Optimize First?
- Decision Tree: Should I Automate This?
- Decision Tree: Should I Page Someone?
operational-reasoning¶
operations¶
- Cloud Ops Basics — Trivia & Interesting Facts
- Kubernetes Ops — Trivia & Interesting Facts
- Linux Ops — Trivia & Interesting Facts
ops-archaeology¶
- Answer Key: The 5% That Can't Resolve
- Answer Key: The Alerts That Stopped Firing
- Answer Key: The Certificate That Works Sometimes
- Answer Key: The Cluster That Disagrees With Itself
- Answer Key: The Container That Exits Immediately
- Answer Key: The DR That Looks Ready But Isn't
- Answer Key: The Deploy That Didn't Deploy
- Answer Key: The Gateway That Returns 502
- Answer Key: The Job That Succeeded Wrong
- Answer Key: The Pods That Won't Schedule
- Answer Key: The Replica That Fell Behind
- Answer Key: The Requests That Vanish
- Answer Key: The Service That Won't Start
- Answer Key: The Session Store That Keeps Dying
- Answer Key: The Slow Death Nobody Noticed
- Ops Archaeology: Reverse-Engineering Production Systems
- Ops Archaeology: The 5% That Can't Resolve
- Ops Archaeology: The 5% That Can't Resolve
- Ops Archaeology: The Alerts That Stopped Firing
- Ops Archaeology: The Alerts That Stopped Firing
- Ops Archaeology: The Certificate That Works Sometimes
- Ops Archaeology: The Certificate That Works Sometimes
- Ops Archaeology: The Cluster That Disagrees With Itself
- Ops Archaeology: The Cluster That Disagrees With Itself
- Ops Archaeology: The Container That Exits Immediately
- Ops Archaeology: The Container That Exits Immediately
- Ops Archaeology: The DR That Looks Ready But Isn't
- Ops Archaeology: The DR That Looks Ready But Isn't
- Ops Archaeology: The Deploy That Didn't Deploy
- Ops Archaeology: The Deploy That Didn't Deploy
- Ops Archaeology: The Gateway That Returns 502
- Ops Archaeology: The Gateway That Returns 502
- Ops Archaeology: The Job That Succeeded Wrong
- Ops Archaeology: The Job That Succeeded Wrong
- Ops Archaeology: The Pods That Won't Schedule
- Ops Archaeology: The Pods That Won't Schedule
- Ops Archaeology: The Replica That Fell Behind
- Ops Archaeology: The Replica That Fell Behind
- Ops Archaeology: The Requests That Vanish
- Ops Archaeology: The Requests That Vanish
- Ops Archaeology: The Service That Won't Start
- Ops Archaeology: The Service That Won't Start
- Ops Archaeology: The Session Store That Keeps Dying
- Ops Archaeology: The Session Store That Keeps Dying
- Ops Archaeology: The Slow Death Nobody Noticed
- Ops Archaeology: The Slow Death Nobody Noticed
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
- Progressive Hints
ops-war-stories¶
- Ops War Stories & Pattern Recognition - Street-Level Ops
- Ops War Stories & Pattern Recognition Footguns
- Ops War Stories — Trivia & Interesting Facts
- Primer
ops_war_stories¶
opsec_mistakes¶
- Anti-Primer: Opsec Mistakes
- OpSec Mistakes - Street-Level Ops
- OpSec Mistakes Footguns
- Opsec Mistakes
orchestration¶
overlay¶
package-management¶
- DNF Package Manager
- Linux Deep Triage
- Linux Ops Footguns
- Linux System Administration - Street Ops
- Primer
- Skillcheck: Linux Fundamentals
package_management¶
- Anti-Primer: Package Management
- Package Management
- Package Management - Street-Level Ops
- Package Management Footguns
packages¶
packaging¶
packer¶
parca¶
pattern¶
- Failure Pattern Catalog
- Pattern: Alerting on Restart (Not Root Cause)
- Pattern: Apply-Without-Reading Manifest
- Pattern: Cache Stampede
- Pattern: Cgroup Soft/Hard Limit Confusion
- Pattern: Clock Skew Ordering
- Pattern: Connection Pool Exhaustion
- Pattern: Deep Health Check Cascade
- Pattern: Deleted-But-Open File
- Pattern: Dependency Chain Collapse
- Pattern: Device Name Confusion
- Pattern: Disk Full (Reserved Blocks Gone)
- Pattern: Dual-Write Divergence
- Pattern: Hardcoded Namespace Override
- Pattern: Health Check Lying
- Pattern: Inode Exhaustion
- Pattern: Memory Limit Equals Request
- Pattern: Metric Cardinality Explosion
- Pattern: Missing Backpressure
- Pattern: Missing Escalation Criteria
- Pattern: Missing Point-in-Time Recovery
- Pattern: Missing absent() Alert
- Pattern: No Circuit Breaker
- Pattern: OOM Without Swap Buffer
- Pattern: PID Exhaustion via Zombies
- Pattern: PVC Reclaim Policy Delete
- Pattern: Percentile Blindness
- Pattern: Port-Forward as Permanent Fix
- Pattern: RAID Rebuild I/O Saturation
- Pattern: Replication Lag at Failover
- Pattern: Restart Avalanche
- Pattern: Retry Amplification
- Pattern: Retry Storm
- Pattern: Rollout Hang (Zero Surge + Zero Unavailable)
- Pattern: Runbook with No Contacts
- Pattern: STP Disabled + Loop Created
- Pattern: Simultaneous Timer Expiry
- Pattern: Stale Image Tag
- Pattern: Stale Leader
- Pattern: StatefulSet OrderedReady Deadlock
- Pattern: Thread Pool Exhaustion
- Pattern: Timeout Assumed = Not Executed
- Pattern: Transaction ID Wraparound
- Pattern: Two-Node Quorum Trap
- Pattern: Unstructured Logging
- Pattern: Untested Backup
- Pattern: Untested Rollback Procedure
- Pattern: Wrong Terminal Tab
- Pattern: Zombie Process Accumulation
- Pattern: latest Tag in Production
- Pattern: ndots:5 Query Amplification
- Pattern: rate() Over Too-Short Window
- Pattern: tmpfs Consuming Hidden RAM
perf¶
- Anti-Primer: Tracing
- Distributed Tracing - Street-Level Ops
- Distributed Tracing Footguns
- Primer
- Tracing
- perf Profiling
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
perf_profiling¶
performance¶
- Lab 17: Performance Tuning
- Linux Kernel Tuning - Street-Level Ops
- Linux Kernel Tuning Footguns
- Linux Performance — Trivia & Interesting Facts
- Load Testing Footguns
- Load Testing — Street-Level Ops
- Performance
- Performance Profiling — Trivia & Interesting Facts
- Primer
- Primer
- perf Profiling
personal-dev¶
- CSS Fundamentals — Trivia & Interesting Facts
- Career Engineering — Trivia & Interesting Facts
- Change Management — Trivia & Interesting Facts
- Corporate IT Fluency — Trivia & Interesting Facts
- Debugging Methodology — Trivia & Interesting Facts
pipeline¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
pipes_and_redirection¶
pki¶
- Primer
- Primer
- TLS & Certificates Ops - Street-Level Ops
- cert-manager Footguns
- cert-manager — Street-Level Ops
platform-engineering¶
- Backstage — Trivia & Interesting Facts
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Platform Engineering — Trivia & Interesting Facts
- Primer
platform_engineering¶
- Anti-Primer: Platform Engineering
- Comparison: Kubernetes Templating
- Comparison: Local Dev for Kubernetes
- Comparison: Local Kubernetes Clusters
- How We Got Here: Developer Experience
- Lab 20: Platform Engineering
- Platform Engineering Patterns
pods¶
policy-as-code¶
policy-engines¶
- Interview: Kyverno Blocking Deploys
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- Policy Engine Drills
- Policy Engine Footguns
- Policy Engines - Street-Level Ops
- Primer
- Primer
- Runbook: Kyverno Blocking Workloads
- Skillcheck: Policy Engines
policy_engines¶
postgresql¶
- Anti-Primer: Postgresql
- PostgreSQL Footguns
- PostgreSQL Operations
- PostgreSQL Operations - Street-Level Ops
postmortem¶
- Postmortem Anthology
- Postmortem: AWS AZ Network Degradation Triggers Cascading Health Check Failures
- Postmortem: AWS Credentials Committed to Public Repo — Caught by Pre-Commit Hook
- Postmortem: Alert Routing Sends All Pages to Decommissioned Channel
- Postmortem: Ansible Playbook Targets Production Instead of Staging
- Postmortem: BGP Route Leak Sends Customer Traffic Through Monitoring VLAN
- Postmortem: Core Switch Firmware Bug Causes Cascading Network Partition
- Postmortem: Custom Controller Missing Backoff Overwhelms API Server
- Postmortem: DNS CNAME Chain Breaks After Load Balancer Rename
- Postmortem: Debug Build Deployed to Production via Copy-Paste Error
- Postmortem: Expired Wildcard TLS Certificate Causes Full API Gateway Outage
- Postmortem: Go Dependency Update Silently Changes Default Timeout — Caught in Canary
- Postmortem: Helm Values Mismatch Routes Staging Traffic to Production Database
- Postmortem: Kernel TCP Regression After Security Patch
- Postmortem: Memory Leak in Log Shipping Agent Causes Fleet-Wide OOM Kills
- Postmortem: Missing Circuit Breaker Lets Redis Failure Cascade to All Services
- Postmortem: Missing Runbook Extends CrashLoopBackOff Recovery by 45 Minutes
- Postmortem: No Review Gate on Terraform Destroy Leads to Wrong Account Teardown
- Postmortem: On-Call Handoff Gap Leaves Alerts Unacknowledged for 3 Hours
- Postmortem: Production Database Deleted by Terraform Apply on Wrong Workspace
- Postmortem: Prometheus Cardinality Explosion from Debug Labels
- Postmortem: Race Condition in Distributed Lock Manager Corrupts Shared State
- Postmortem: Resource Quota Misconfiguration Blocks All Deployments
- Postmortem: S3 Bucket Policy Change Nearly Deletes All Backup Archives
- Postmortem: SSD Firmware Bug Causes Silent Bit Corruption
- Postmortem: Single etcd Member Disk Full Degrades Control Plane
- Postmortem: Stale Docker Base Image Ships Known CVE to Production
- Postmortem: UPS Battery Degradation Causes Rack Power Loss During Utility Blip
- Postmortem: Unbounded Kafka Topic Exhausts Broker Disk
- Postmortem: Unbounded Retry Storm Takes Down Payment Processing
- Postmortem: Wildcard Ingress Rule Nearly Exposes Internal Admin Panel
postmortem-slo¶
- Incident Postmortem & SLO/SLI - Street-Level Ops
- Postmortem & SLO Drills
- Postmortem & SLO Footguns
- Primer
- Primer
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
- Skillcheck: Postmortems & SLOs
postmortem_slo¶
power¶
- Anti-Primer: Power
- Power
- Power & UPS - Street-Level Ops
- Power & UPS Footguns
- Power — Trivia & Interesting Facts
powershell¶
- Anti-Primer: Powershell
- PowerShell
- PowerShell Footguns
- PowerShell Street Ops
- PowerShell — Trivia & Interesting Facts
probes¶
- Deep Dive: Kubernetes Pod Lifecycle
- Diagnostic Questions
- Grading Rubric
- Interview: Deployment Stuck Progressing
- Investigation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Ops Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
- Remediation: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Runbook: Readiness Probe Failed
- Symptoms: Alert Storm, Caused by Flapping Health Checks, Fix Is Probe Tuning
- Track: Kubernetes Core
proc_filesystem¶
- /proc Filesystem
- /proc Filesystem - Street-Level Ops
- /proc Filesystem Footguns
- Anti-Primer: Proc Filesystem
process-management¶
process_management¶
processes¶
- Linux Signals & Process Control - Footguns
- Linux Signals & Process Control - Street-Level Ops
- Primer
- Process Management — Trivia & Interesting Facts
production-readiness¶
production_readiness¶
- Production Readiness Assessment
- Production Readiness Review: Answer Key
- Production Readiness Review: Scoring Guide
- Production Readiness Review: Study Plans
- Production Readiness Review: System Architecture
productivity¶
profiling¶
progressive-delivery¶
progressive_delivery¶
prometheus¶
- Alerting Rules Drills
- Alerting Rules Footguns
- Capacity Planning - Street-Level Ops
- Capacity Planning Footguns
- Diagnostic Questions
- Grading Rubric
- Interview: Prometheus Target Down
- Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Log Analysis & Alerting Rules - Street-Level Ops
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Observability Deep Dive - Street Ops
- Observability Drills
- Observability Footguns
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- PromQL Drills
- Remediation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- SLO Tooling Footguns
- SLO Tooling — Street-Level Ops
- Skillcheck: Alerting Rules
- Skillcheck: Observability
- Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
- Track: Observability
prometheus_deep_dive¶
- Anti-Primer: Prometheus Deep Dive
- Prometheus Deep Dive
- Prometheus Deep Dive - Street-Level Ops
- Prometheus Deep Dive Footguns
- Thinking Out Loud: Prometheus Deep Dive
proxy¶
pulumi¶
pxe¶
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Case Study: PXE Boot Fails UEFI Mismatch
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Footguns
- Grading Checklist: PXE Boot Fails - UEFI Mismatch
- Primer
- Primer
- Questions: PXE Boot Fails - UEFI Mismatch
- Solution: PXE Boot Fails - UEFI Mismatch
- Symptoms: PXE Boot Fails - UEFI Mismatch
pyroscope¶
pyrra¶
python¶
- Primer
- Primer
- Python Async & Concurrency
- Python Debugging
- Python Debugging Footguns
- Python Debugging — Street-Level Ops
- Python Debugging — Trivia & Interesting Facts
- Python Packaging
- Python Packaging Footguns
- Python Packaging — Street-Level Ops
- Python Packaging — Trivia & Interesting Facts
- Python for Infrastructure — Trivia & Interesting Facts
- Trivia compendium
python-automation¶
- Primer
- Python Drills
- Python for Infrastructure - Street-Level Ops
- Python for Infrastructure Footguns
- Skillcheck: Python Automation
python_async_concurrency¶
- Anti-Primer: Python Async Concurrency
- Python Async & Concurrency - Street-Level Ops
- Python Async & Concurrency Footguns
python_debugging¶
python_infra¶
python_packaging¶
rabbitmq¶
rack-ops¶
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Drills
- Datacenter Footguns
- Primer
- Scenario: Thermal Throttling
- Skillcheck: Datacenter
raid¶
- Case Study: RAID Degraded Rebuild Latency
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Footguns
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Diagnostic Questions
- Disk & Storage Ops — Trivia & Interesting Facts
- Grading Checklist: RAID Degraded Rebuild Latency
- Grading Rubric
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Primer
- Primer
- Primer
- Questions: RAID Degraded Rebuild Latency
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Scenario: RAID Array Degraded
- Skillcheck: Datacenter
- Solution: RAID Degraded Rebuild Latency
- Storage Operations - Street-Level Ops
- Storage Operations Footguns
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: RAID Degraded Rebuild Latency
rbac¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
redfish¶
- Anti-Primer: Redfish
- Primer
- Redfish -- Footguns
- Redfish -- Street Ops
- Redfish API
- Redfish — Trivia & Interesting Facts
redhat¶
redis¶
reference¶
regex¶
- Anti-Primer: awk
- Anti-Primer: sed
- Primer
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- Regex & Text Wrangling — Trivia & Interesting Facts
- grep & Regular Expressions
regex_text_wrangling¶
registry¶
- Diagnostic Questions
- Grading Rubric
- Investigation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Remediation: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
- Symptoms: CI Pipeline Fails, Docker Layer Cache Corruption, Fix Is Registry GC
rego¶
reliability¶
replication¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Database Replication Lag, Root Cause Is RAID Degradation
- Remediation: Database Replication Lag, Root Cause Is RAID Degradation
- Symptoms: Database Replication Lag, Root Cause Is RAID Degradation
resource-management¶
rhce¶
- Anti-Primer: RHCE
- Primer
- RHCE (EX294) — Footguns & Pitfalls
- RHCE (EX294) — Street Ops
- RHCE Exam Prep
rhel¶
ripgrep¶
- Modern CLI Drills
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Primer
- Skillcheck: Modern CLI Tools
- ripgrep — Trivia & Interesting Facts
routing¶
- Anti-Primer: Routing
- Case Study: Asymmetric Routing One Direction
- Grading Checklist
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Footguns
- Primer
- Questions to Determine
- Routing
- Routing - Street-Level Ops
- Routing Footguns
- Scenario: Asymmetric Routing
- Skillcheck: Networking Fundamentals
- Solution: Asymmetric Routing / One-Direction Failure
- Symptoms: Asymmetric Routing / One-Direction Failure
rsync¶
runbook¶
- Operational Runbooks
- Runbook
- Runbook: ArgoCD Out of Sync
- Runbook: Certificate Renewal Failed
- Runbook: Disaster Recovery
- Runbook: HPA Not Scaling
- Runbook: Helm Upgrade Failed
- Runbook: ImagePullBackOff
- Runbook: Ingress 404
- Runbook: Istio 503 Errors
- Runbook: Kyverno Blocking Workloads
- Runbook: Loki No Logs
- Runbook: NetworkPolicy Block
- Runbook: Pod Eviction
- Runbook: RBAC Forbidden
- Runbook: Readiness Probe Failed
- Runbook: Secret Rotation
- Runbook: Tempo No Traces
- Runbook: VPC IP Exhaustion
- Runbook: Velero Backup & Restore
- Runbook: etcd Backup & Restore
runbook-craft¶
- Primer
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- Runbook Craft — Trivia & Interesting Facts
runbook_craft¶
runtime-security¶
s3¶
- AWS S3 Deep Dive
- Primer
- S3-Compatible Object Storage Footguns
- S3-Compatible Object Storage — Street-Level Ops
s3-object-storage¶
s3_object_storage¶
scanner¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
scenario¶
- Interview: CI Vuln Scan Failed
- Interview: Certificate Expired
- Interview: Config Drift Detected
- Interview: Cost Spike Investigation
- Interview: Database Failover During Deploy
- Interview: Deployment Stuck Progressing
- Interview: Docker Container Debugging
- Interview: GitOps Drift Detected
- Interview: HPA Not Scaling
- Interview: Helm Upgrade Broke Prod
- Interview: Ingress 404
- Interview: Kyverno Blocking Deploys
- Interview: Linux Server Slow
- Interview: Loki Logs Disappeared
- Interview: Pods OOMKilled
- Interview: Prometheus Target Down
- Interview: RBAC Forbidden
- Interview: Secret Leaked to Git
- Interview: Server Won't POST
- Interview: Service Mesh 503s
- Interview: Vault Token Expired
- Interview: etcd Space Exceeded
- Scenario: Asymmetric Routing
- Scenario: DNS Looks Fine but App Fails
- Scenario: Duplex Mismatch
- Scenario: MTU Blackhole
- Scenario: NIC Flapping / LACP Mismatch
- Scenario: OOB Unreachable but Host Responds
- Scenario: RAID Array Degraded
- Scenario: Server Won't Boot After Update
- Scenario: Thermal Throttling
- Scenario: VLAN Trunk Mismatch
- Scenario: etcd Troubleshooting
- Scenarios
scheduling¶
- Cron Scheduling — Trivia & Interesting Facts
- Kubernetes Pods & Scheduling - Street Ops
- Kubernetes Pods & Scheduling Footguns
- Primer
screen¶
scripting_rosetta¶
secrets-management¶
- Interview: Secret Leaked to Git
- Interview: Vault Token Expired
- Primer
- Runbook: Secret Rotation
- Secrets Management - Street-Level Ops
- Secrets Management Drills
- Secrets Management Footguns
- Skillcheck: Secrets Management
secrets_management¶
- Anti-Primer: Secrets Management
- How We Got Here: Secrets Management
- Secrets Management
- Thinking Out Loud: Secrets Management
security¶
- Audit Logging — Trivia & Interesting Facts
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Compliance Automation — Trivia & Interesting Facts
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Diagnostic Questions
- Disaster Recovery & Backup Engineering - Street-Level Ops
- Disaster Recovery Footguns
- Falco — Trivia & Interesting Facts
- Footguns
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- Grading Rubric
- HashiCorp Vault - Street-Level Ops
- HashiCorp Vault Footguns
- HashiCorp Vault — Trivia & Interesting Facts
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- Infrastructure Forensics — Trivia & Interesting Facts
- Interview: CI Vuln Scan Failed
- Interview: Certificate Expired
- Interview: Secret Leaked to Git
- Interview: Vault Token Expired
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Kubernetes RBAC — Trivia & Interesting Facts
- LDAP & Identity Management - Street-Level Ops
- LDAP & Identity Management Footguns
- LDAP & Identity — Trivia & Interesting Facts
- Linux Hardening — Trivia & Interesting Facts
- OPSEC Mistakes — Trivia & Interesting Facts
- Offensive Security Basics — Trivia & Interesting Facts
- Open Policy Agent — Trivia & Interesting Facts
- Ops-Focused Security Basics - Street Ops
- Policy Engines — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Remediation: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Remediation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- Runbook: Certificate Renewal Failed
- Runbook: Secret Rotation
- Runtime Security with Falco Footguns
- Runtime Security with Falco — Street-Level Ops
- SELinux & AppArmor - Street-Level Ops
- SELinux & AppArmor Footguns
- SELinux & AppArmor — Trivia & Interesting Facts
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- Secrets Management - Street-Level Ops
- Secrets Management Drills
- Secrets Management Footguns
- Secrets Management — Trivia & Interesting Facts
- Security Basics — Trivia & Interesting Facts
- Security Drills
- Security Footguns
- Security Scanning — Trivia & Interesting Facts
- Skillcheck: Secrets Management
- Skillcheck: Security (Expanded)
- Skillcheck: TLS & PKI
- Street ops
- Supply Chain Security - Street-Level Ops
- Supply Chain Security Footguns
- Symptoms: API Latency Spike, BGP Route Leak, Fix Is Network ACL
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation
- TLS & PKI Drills
- TLS & Certificates Ops - Street-Level Ops
- TLS & Certificates Ops Footguns
- TLS & Certificates — Trivia & Interesting Facts
- cert-manager — Trivia & Interesting Facts
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
security-scanning¶
- Interview: CI Vuln Scan Failed
- Ops-Focused Security Basics - Street Ops
- Primer
- Security Drills
- Security Footguns
- Skillcheck: Security (Expanded)
security_basics¶
security_scanning¶
- Anti-Primer: Security Scanning
- Comparison: Image Scanners
- Comparison: Policy Engines
- Comparison: Secrets Management
- Decision Tree: A Secret Was Exposed
- Decision Tree: Container Running as Root
- Decision Tree: Dependency Has a CVE
- Decision Tree: I Found a Vulnerability
- Decision Tree: Suspicious Activity Detected
- Overview
- Security Scanning
- Security Scanning - Street-Level Ops
- Security Scanning Footguns
selinux¶
selinux_apparmor¶
server-hardware¶
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Case Study: Memory ECC Errors Increasing
- Case Study: Thermal Throttle Fan Failure
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Grading Checklist: Memory ECC Errors Increasing
- Grading Checklist: Thermal Throttle - Fan Failure
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Interview: Server Won't POST
- Primer
- Primer
- Primer
- Primer
- Primer
- Questions: Memory ECC Errors Increasing
- Questions: Thermal Throttle - Fan Failure
- Redfish -- Footguns
- Redfish -- Street Ops
- Scenario: NIC Flapping / LACP Mismatch
- Scenario: RAID Array Degraded
- Scenario: Thermal Throttling
- Server Hardware — Trivia & Interesting Facts
- Skillcheck: Datacenter
- Solution: Memory ECC Errors Increasing
- Solution: Thermal Throttle - Fan Failure
- Symptoms: Memory ECC Errors Increasing
- Symptoms: Thermal Throttle - Fan Failure
- Virtualization - Street-Level Ops
- Virtualization Footguns
server_hardware¶
- Anti-Primer: Server Hardware
- Server Hardware
- Server Hardware - Street-Level Ops
- Server Hardware Footguns
serverless¶
service-discovery¶
service-management¶
service-mesh¶
- Diagnostic Questions
- Footguns
- Footguns
- Grading Rubric
- Interview: Service Mesh 503s
- Investigation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- Istio — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- Primer
- Remediation: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
- Runbook: Istio 503 Errors
- Service Mesh - Street-Level Ops
- Service Mesh Drills
- Service Mesh Footguns
- Skillcheck: Service Mesh
- Street ops
- Street ops
- Symptoms: Service Mesh 503s, Envoy Misconfigured, Root Cause Is RBAC Policy
service_mesh¶
services¶
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- Linux Ops Systemd
- Primer
shell¶
sidecar¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Remediation: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
- Symptoms: Pod OOMKilled, Memory Leak Is in Sidecar, Fix Is Helm Values
signals¶
- Linux Signals & Process Control - Footguns
- Linux Signals & Process Control - Street-Level Ops
- Primer
skillcheck¶
slo-tooling¶
slo_tooling¶
sloth¶
smtp¶
solution¶
- Solution: Lab Runtime 01 -- Readiness Probe Failure
- Solution: Lab Runtime 02 -- HPA Live Scaling
- Solution: Lab Runtime 03 -- Observability Target Down
- Solution: Lab Runtime 04 -- Loki No Logs
- Solution: Lab Runtime 05 -- Helm Upgrade Rollback
- Solution: Lab Runtime 06 -- Trivy Fail to Green
- Solution: Lab Runtime 07 -- GitOps Sync and Drift
- Solution: Lab Runtime 08 -- Resource Limits OOM
- Solutions
spf¶
spine-leaf¶
sql¶
sql_fundamentals¶
sqlite¶
- Anti-Primer: Sqlite
- SQLite Footguns
- SQLite Operations & Internals
- SQLite Operations & Internals - Street-Level Ops
sre¶
- Anti-Primer: SRE Practices
- Capacity Planning - Street-Level Ops
- Capacity Planning Footguns
- Capacity Planning — Trivia & Interesting Facts
- Change Management - Street-Level Ops
- Change Management Footguns
- Disaster Recovery — Trivia & Interesting Facts
- Primer
- Primer
- Primer
- SRE Practices
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
sre-practices¶
ssh¶
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Investigation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Primer
- Remediation: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- Symptoms: Ansible Playbook Hangs, SSH Agent Forwarding Broken, Root Cause Is Firewall Rule
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
ssh_deep_dive¶
state-lock¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
storage¶
- AWS S3 Deep Dive
- Ceph — Trivia & Interesting Facts
- Comparison: Caching
- Comparison: Messaging
- Comparison: Relational Databases
- Disk & Storage Ops
- Inodes
- Kubernetes Storage — Trivia & Interesting Facts
- Linux Data Hoarding
- Linux Ops Storage
- Linux Ops Storage — Trivia & Interesting Facts
- Primer
- Storage (SAN/NAS/DAS)
- Storage Operations - Street-Level Ops
- Storage Operations Footguns
storage-ops¶
stp¶
- Anti-Primer: STP
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Primer
- STP (Spanning Tree)
- STP - Street-Level Ops
- STP Footguns
strace¶
- Anti-Primer: Strace
- Primer
- strace
- strace Footguns
- strace — Street-Level Ops
- strace — Trivia & Interesting Facts
street-ops¶
subnetting_and_ip_addressing¶
- Anti-Primer: Subnetting And IP Addressing
- Subnetting & IP Addressing
- Subnetting and IP Addressing - Street-Level Ops
- Subnetting and IP Addressing Footguns
supply-chain-security¶
- Supply Chain Security - Street-Level Ops
- Supply Chain Security Footguns
- Supply Chain Security — Trivia & Interesting Facts
supply_chain_security¶
synthetic-monitoring¶
- Primer
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
- Synthetic Monitoring — Trivia & Interesting Facts
synthetic_monitoring¶
sysctl¶
system-behavior¶
systemctl_journalctl¶
systemd¶
- Case Study: Systemd Service Flapping
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- Deep Dive: Linux Boot Sequence
- Deep Dive: Systemd Architecture
- Grading Checklist
- Linux Deep Triage
- Linux Ops Footguns
- Linux Ops Systemd
- Linux Ops — systemd — Trivia & Interesting Facts
- Linux System Administration - Street Ops
- Primer
- Primer
- Primer
- Questions to Determine
- Skillcheck: Linux Fundamentals
- Solution
- Symptoms
- systemctl & journalctl Footguns
- systemctl & journalctl Street Ops
systems-thinking¶
- Debugging Methodology - Street-Level Ops
- Debugging Methodology Footguns
- Primer
- Primer
- Systems Thinking Footguns
- Systems Thinking for Engineers - Street-Level Ops
systems_thinking¶
tailscale¶
- Anti-Primer: Tailscale
- Tailscale & Zero Trust Networking
- Tailscale - Street-Level Ops
- Tailscale Footguns
tar_and_compression¶
tcp¶
tcp-ip¶
- Case Study: NAT Exhaustion Intermittent
- DHCP & IP Address Management - Street-Level Ops
- DHCP & IP Address Management Footguns
- Deep Dive: Linux Network Packet Flow
- Grading Checklist
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Drills
- Networking Footguns
- Primer
- Primer
- Questions to Determine
- Skillcheck: Networking Fundamentals
- Solution: NAT Port Exhaustion / Intermittent Failures
- Symptoms: NAT Port Exhaustion / Intermittent Failures
tcp_ip¶
- Anti-Primer: Networking
- Comparison: CNI Plugins
- Comparison: Ingress Controllers
- Comparison: Service Meshes
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist
- Grading Checklist: DHCP Not Working on Remote VLAN
- Grading Checklist: DNS Resolution Taking 5+ Seconds Intermittently
- Grading Checklist: Jumbo Frames Enabled But Some Paths Failing
- Grading Checklist: Multicast Traffic Not Crossing Router
- Grading Checklist: OSPF Adjacency Stuck in ExStart/Exchange State
- Grading Checklist: Proxy ARP Causing Unexpected Routing Behavior
- Grading Checklist: TCP Connections Reset After Idle Period
- Grading Checklist: Traffic From Specific Source Not Taking Expected Path
- Incident Replay: ARP Flux — Duplicate IP Detection
- Incident Replay: Asymmetric Routing — Traffic Works One Direction Only
- Incident Replay: BGP Peer Flapping
- Incident Replay: DHCP Relay Broken
- Incident Replay: DNS Resolution Slow
- Incident Replay: DNS Split-Horizon Confusion
- Incident Replay: Duplex Mismatch Symptoms
- Incident Replay: Firewall Shadow Rule
- Incident Replay: Jumbo Frames Partial Deployment
- Incident Replay: LACP Mismatch — One Link Hot
- Incident Replay: MTU Blackhole — TLS Stalls
- Incident Replay: Multicast Not Crossing Router
- Incident Replay: NAT Exhaustion — Intermittent Connectivity
- Incident Replay: Network Loop — Broadcast Storm
- Incident Replay: OSPF Stuck in ExStart
- Incident Replay: Proxy ARP Causing Issues
- Incident Replay: SSL Certificate Chain Incomplete
- Incident Replay: Source Routing Policy Miss
- Incident Replay: TCP RST After Idle
- Incident Replay: VLAN Trunk Mistag
- Lab 3: Networking Fundamentals
- Networking Deep Dive
- Overview
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions to Determine
- Questions: DHCP Not Working on Remote VLAN
- Questions: DNS Resolution Taking 5+ Seconds Intermittently
- Questions: Jumbo Frames Enabled But Some Paths Failing
- Questions: Multicast Traffic Not Crossing Router
- Questions: OSPF Adjacency Stuck in ExStart/Exchange State
- Questions: Proxy ARP Causing Unexpected Routing Behavior
- Questions: TCP Connections Reset After Idle Period
- Questions: Traffic From Specific Source Not Taking Expected Path
- Solution: ARP Flux / Duplicate IP
- Solution: BGP Peer Flapping
- Solution: DHCP Not Working on Remote VLAN
- Solution: DNS Resolution Taking 5+ Seconds Intermittently
- Solution: DNS Split-Horizon Confusion
- Solution: Duplex Mismatch
- Solution: Firewall Shadow Rule
- Solution: Jumbo Frames Enabled But Some Paths Failing
- Solution: LACP Mismatch / One Link Hot
- Solution: Multicast Traffic Not Crossing Router
- Solution: OSPF Adjacency Stuck in ExStart/Exchange State
- Solution: Proxy ARP Causing Unexpected Routing Behavior
- Solution: TCP Connections Reset After Idle Period
- Solution: Traffic From Specific Source Not Taking Expected Path
- Solution: VLAN Trunk Mistag
- Symptoms: ARP Flux / Duplicate IP
- Symptoms: BGP Peer Flapping
- Symptoms: DHCP Not Working on Remote VLAN
- Symptoms: DNS Resolution Taking 5+ Seconds Intermittently
- Symptoms: DNS Split-Horizon Confusion
- Symptoms: Duplex Mismatch
- Symptoms: Firewall Shadow Rule
- Symptoms: Jumbo Frames Enabled But Some Paths Failing
- Symptoms: LACP Mismatch / One Link Hot
- Symptoms: Multicast Traffic Not Crossing Router
- Symptoms: OSPF Adjacency Stuck in ExStart/Exchange State
- Symptoms: Proxy ARP Causing Unexpected Routing Behavior
- Symptoms: TCP Connections Reset After Idle Period
- Symptoms: Traffic From Specific Source Not Taking Expected Path
- Symptoms: VLAN Trunk Mistag
tcp_ip_deep_dive¶
tempo¶
- Observability Deep Dive - Street Ops
- Observability Footguns
- Primer
- Runbook: Tempo No Traces
- Skillcheck: Observability
- Track: Observability
terminal¶
terminal-internals¶
terminal_internals¶
terraform¶
- Anti-Primer: Terraform
- Comparison: Configuration Management
- Comparison: Infrastructure as Code Tools
- Deep Dive: Terraform State Internals
- Diagnostic Questions
- Diagnostic Questions
- Grading Rubric
- Grading Rubric
- Infrastructure as Code with Terraform - Street Ops
- Investigation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Investigation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Primer
- Primer
- Remediation: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Remediation: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Skillcheck: Terraform / IaC
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable
- Symptoms: Terraform Apply Fails, State Lock Stuck, Root Cause Is DynamoDB Throttle
- Terraform / IaC
- Terraform Deep Dive - Footguns
- Terraform Deep Dive - Street Ops
- Terraform Drills
- Terraform Footguns
- Terraform — Trivia & Interesting Facts
- Thinking Out Loud: Terraform
- Track: Infrastructure
terraform_deep_dive¶
terratest¶
testing¶
text-processing¶
- Linux Text Processing
- Linux Text Processing - Street-Level Ops
- Linux Text Processing Footguns
- Linux Text Processing — Trivia & History
- Primer
- Primer
- Primer
- Primer
- Regex & Text Wrangling — Trivia & Interesting Facts
- awk — Footguns
- awk — Street-Level Ops
- awk: The Record/Field Processor
- grep & Regular Expressions
- grep & Regular Expressions - Footguns
- grep & Regular Expressions - Street-Level Ops
- sed — Footguns
- sed — Street-Level Ops
- sed: The Stream Editor
threat-detection¶
tls¶
- Case Study: SSL Cert Chain Incomplete
- Diagnostic Questions
- Grading Checklist: TLS Works From Some Clients But Fails From Others
- Grading Rubric
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- Investigation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Nginx & Web Servers - Street-Level Ops
- Nginx & Web Servers Footguns
- Primer
- Primer
- Primer
- Primer
- Questions: TLS Works From Some Clients But Fails From Others
- Remediation: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Scenario: DNS Looks Fine but App Fails
- Solution: TLS Works From Some Clients But Fails From Others
- Symptoms: DNS Looks Broken, TLS Is Expired, Fix Is in Cert-Manager
- Symptoms: TLS Works From Some Clients But Fails From Others
- TLS & Certificates Ops - Street-Level Ops
- TLS & Certificates Ops Footguns
- cert-manager Footguns
- cert-manager — Street-Level Ops
tls-pki¶
- Interview: Certificate Expired
- Runbook: Certificate Renewal Failed
- Skillcheck: TLS & PKI
- TLS & PKI Drills
tls_certificates_ops¶
tmux¶
tmux_and_screen¶
toil_reduction¶
tools¶
- Modern CLI — Trivia & Interesting Facts
- Tools Reference
- fd — Trivia & Interesting Facts
- fzf — Trivia & Interesting Facts
topic¶
topic-pack¶
- AI Tools for DevOps - Footguns
- AI Tools for DevOps - Street Ops
- AI/ML Ops Footguns
- API Gateways & Ingress - Street-Level Ops
- API Gateways & Ingress Footguns
- AWS EC2
- AWS EC2 - Street-Level Ops
- AWS EC2 Footguns
- AWS IAM
- AWS IAM - Street-Level Ops
- AWS IAM Footguns
- AWS Lambda
- AWS Lambda - Street-Level Ops
- AWS Lambda Footguns
- AWS Networking
- AWS Networking - Street-Level Ops
- AWS Networking Footguns
- AWS Route 53
- AWS Route 53 - Street-Level Ops
- AWS Route 53 Footguns
- AWS S3 Deep Dive
- Advanced Bash Footguns
- Advanced Bash for Ops - Street-Level Ops
- Alerting Rules Footguns
- Ansible Deep Dive - Footguns
- Ansible Deep Dive - Street Ops
- Ansible Footguns
- Ansible for Infrastructure Automation - Street Ops
- Ansible: idempotence + modules vs plugins vs collections
- Ansible: inventory — hosts, groups, vars, targeting
- Ansible: playbook vs play vs task vs role vs handler
- Ansible: variable precedence
- Argo Workflows Footguns
- Argo Workflows — Street-Level Ops
- ArgoCD & GitOps Footguns
- ArgoCD & GitOps — Street-Level Ops
- BGP EVPN / VXLAN Footguns
- BGP EVPN / VXLAN — Street-Level Ops
- Backstage - Street-Level Ops
- Backstage Footguns
- Bare-Metal Provisioning - Street-Level Ops
- Bare-Metal Provisioning Footguns
- Binary and Floating Point Footguns
- Binary and Floating Point — Street-Level Ops
- Binary and Floats
- Btrfs: subvolume, snapshot, reflink, CoW
- CI/CD Footguns
- CI/CD Pipelines - Street Ops
- CI/CD as a System
- CSS Fundamentals
- CSS Fundamentals Footguns
- CSS Fundamentals — Street-Level Ops
- Capacity Planning - Street-Level Ops
- Capacity Planning Footguns
- Career Engineering Footguns
- Career Engineering for Ops People - Street-Level Ops
- Ceph Storage Footguns
- Ceph Storage — Street-Level Ops
- Change Management - Street-Level Ops
- Change Management Footguns
- Chaos Engineering & Fault Injection - Street-Level Ops
- Chaos Engineering Footguns
- Cilium & eBPF Networking - Street-Level Ops
- Cilium & eBPF Networking Footguns
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Cloud Deep-Dive Footguns
- Cloud Operations Basics - Street Ops
- Cloud Ops Footguns
- Cloud Provider Deep-Dive - Street-Level Ops
- Compliance & Audit Automation - Street-Level Ops
- Compliance & Audit Automation Footguns
- Container Base Images — Footguns & Pitfalls
- Container Base Images — Street Ops
- Container vs VM
- Containers Deep Dive
- Containers Deep Dive - Footguns & Pitfalls
- Containers Deep Dive - Street-Level Ops
- Continuous Profiling Footguns
- Continuous Profiling — Street-Level Ops
- Corporate IT Fluency - Street-Level Ops
- Corporate IT Fluency Footguns
- Cost Optimization & FinOps - Street-Level Ops
- Cron & Job Scheduling - Street-Level Ops
- Cron & Job Scheduling Footguns
- Crossplane - Street-Level Ops
- Crossplane Footguns
- DHCP & IP Address Management - Street-Level Ops
- DHCP & IP Address Management Footguns
- DNF Package Manager
- DNS Deep Dive - Footguns
- DNS Deep Dive - Street-Level Ops
- DNS Operations - Street-Level Ops
- DNS Operations Footguns
- DNS: Stub Resolver vs Recursive Resolver vs Authoritative Server
- DNSSEC & DNS Security Footguns
- DNSSEC & DNS Security — Street-Level Ops
- DORA Metrics & DevEx Footguns
- DORA Metrics & DevEx — Street-Level Ops
- Dagger - Street-Level Ops
- Dagger Footguns
- Database Operations - Street-Level Ops
- Database Ops Footguns
- Datacenter & Server Hardware - Street Ops
- Datacenter Advanced Operations
- Datacenter Footguns
- Debian & Ubuntu — Footguns & Pitfalls
- Debian & Ubuntu — Street Ops
- Debugging Methodology - Street-Level Ops
- Debugging Methodology Footguns
- Dell PowerEdge Footguns
- Dell PowerEdge — Street-Level Ops
- Deployment vs ReplicaSet vs Pod
- Disaster Recovery & Backup Engineering - Street-Level Ops
- Disaster Recovery Footguns
- Disk & Storage Ops
- Distributed Systems Footguns
- Distributed Systems Fundamentals — Street-Level Ops
- Edge & IoT Infrastructure - Street-Level Ops
- Edge & IoT Infrastructure Footguns
- Email Infrastructure Footguns
- Email Infrastructure — Street-Level Ops
- Feature Flags Footguns
- Feature Flags — Street-Level Ops
- File vs inode vs pathname vs symlink
- FinOps Footguns
- Fleet Operations Footguns
- Fleet Operations at Scale - Street-Level Ops
- Footguns
- Footguns
- Footguns
- Footguns
- Footguns
- Git Footguns
- Git Workflows & Branching Strategies
- Git for DevOps Engineers - Street Ops
- Git: commit vs branch vs tag vs HEAD
- Git: rebase vs merge
- Git: working tree vs index vs repository
- GitHub Actions - Street-Level Ops
- GitHub Actions Footguns
- HAProxy & Nginx Load Balancing Footguns
- HAProxy & Nginx for Ops - Street-Level Ops
- HTTP Protocol Footguns
- HTTP Protocol — Street-Level Ops
- HashiCorp Vault - Street-Level Ops
- HashiCorp Vault Footguns
- Homelab & Learning Infrastructure - Street-Level Ops
- Homelab Footguns
- IPMI and ipmitool -- Street Ops
- IPMI and ipmitool Footguns
- Image vs Container
- Incident Command & On-Call - Street-Level Ops
- Incident Command & On-Call Footguns
- Incident Postmortem & SLO/SLI - Street-Level Ops
- Infrastructure Forensics - Street-Level Ops
- Infrastructure Forensics Footguns
- Infrastructure Testing Footguns
- Infrastructure Testing — Street-Level Ops
- Infrastructure as Code with Terraform - Street Ops
- Inodes
- Istio Service Mesh Footguns
- Istio Service Mesh — Street-Level Ops
- K8s Concept Chain — Footguns
- K8s Concept Chain — Street-Level Ops
- Kernel Troubleshooting - Street-Level Ops
- Kernel Troubleshooting Footguns
- Kubernetes Concept Chain
- Kubernetes Control Plane as Reconciliation Engine
- Kubernetes Debugging -- Street Ops
- Kubernetes Debugging Footguns
- Kubernetes Ecosystem - Street-Level Ops
- Kubernetes Ecosystem Footguns
- Kubernetes Node Lifecycle & Cluster Upgrades
- Kubernetes Node Lifecycle -- Street Ops
- Kubernetes Node Lifecycle Footguns
- Kubernetes Ops Footguns
- Kubernetes Pods & Scheduling - Street Ops
- Kubernetes Pods & Scheduling Footguns
- Kubernetes Services & Ingress - Street Ops
- Kubernetes Services & Ingress Footguns
- Kustomize - Street-Level Ops
- Kustomize Footguns
- LDAP & Identity Management - Street-Level Ops
- LDAP & Identity Management Footguns
- LPIC / LFCS — Footguns & Pitfalls
- LPIC / LFCS — Street Ops
- Legacy System Archaeology - Street-Level Ops
- Legacy System Archaeology Footguns
- Linux Boot Process
- Linux Boot Process — Footguns & Pitfalls
- Linux Boot Process — Street Ops
- Linux Data Hoarding
- Linux Deep Triage
- Linux Distribution Comparison — Footguns & Pitfalls
- Linux Distribution Comparison — Street Ops
- Linux Kernel Tuning - Street-Level Ops
- Linux Kernel Tuning Footguns
- Linux Logging
- Linux Logging — Footguns
- Linux Logging — Street Ops
- Linux Memory Management
- Linux Memory Management — Footguns
- Linux Memory Management — Street Ops
- Linux Ops Footguns
- Linux Ops Storage
- Linux Ops Systemd
- Linux Performance Tuning - Street-Level Ops
- Linux Performance Tuning Footguns
- Linux Signals & Process Control - Footguns
- Linux Signals & Process Control - Street-Level Ops
- Linux System Administration - Street Ops
- Linux Text Processing
- Linux Text Processing - Street-Level Ops
- Linux Text Processing Footguns
- Linux Text Processing — Trivia & History
- Linux Users & Permissions
- Linux Users and Permissions — Footguns & Pitfalls
- Linux Users and Permissions — Street Ops
- Linux: kernel vs userspace vs distro
- Load Testing Footguns
- Load Testing — Street-Level Ops
- Log Analysis & Alerting Rules - Street-Level Ops
- Log Pipelines - Street-Level Ops
- Log Pipelines Footguns
- Logs vs Metrics vs Traces
- Mellanox Switches
- Modern CLI Tools - Street Ops
- Modern CLI Tools Footguns
- Modern CLI Workflows Footguns
- Modern Cli Workflows
- MongoDB Operations Footguns
- MongoDB Operations — Street-Level Ops
- Monitoring Fundamentals - Street-Level Ops
- Monitoring Fundamentals Footguns
- Monitoring Migration (Legacy to Modern) - Street-Level Ops
- Monitoring Migration Footguns
- Multi-Cluster & Federation - Exercises & Reference
- Multi-Tenancy Patterns - Street-Level Ops
- Multi-Tenancy Patterns Footguns
- MySQL / MariaDB Operations Footguns
- MySQL / MariaDB Operations — Street-Level Ops
- Network Automation Footguns
- Network Automation — Street-Level Ops
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Footguns
- Nginx & Web Servers - Street-Level Ops
- Nginx & Web Servers Footguns
- Nix / NixOS - Street-Level Ops
- Nix / NixOS Footguns
- Observability Deep Dive - Street Ops
- Observability Footguns
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- OpenTofu - Street-Level Ops
- OpenTofu Footguns
- Ops War Stories & Pattern Recognition - Street-Level Ops
- Ops War Stories & Pattern Recognition Footguns
- Ops-Focused Security Basics - Street Ops
- Permissions: mode bits vs ownership vs ACLs vs capabilities
- Persistent Volume vs Persistent Volume Claim
- Pipes & Redirection - Footguns
- Pipes & Redirection - Street-Level Ops
- Platform Engineering Footguns
- Platform Engineering Patterns - Street-Level Ops
- Pod vs Container (Kubernetes)
- Policy Engine Footguns
- Policy Engines - Street-Level Ops
- PostgreSQL Footguns
- PostgreSQL Operations - Street-Level Ops
- Postmortem & SLO Footguns
- Practical Kubernetes Ops - Street Ops
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Primer
- Process Management - Street-Level Ops
- Process Management Footguns
- Process vs program vs service
- Progressive Delivery Footguns
- Progressive Delivery — Street-Level Ops
- Pulumi - Street-Level Ops
- Pulumi Footguns
- Python Async & Concurrency
- Python Debugging
- Python Debugging Footguns
- Python Debugging — Street-Level Ops
- Python Packaging
- Python Packaging Footguns
- Python Packaging — Street-Level Ops
- Python for Infrastructure - Street-Level Ops
- Python for Infrastructure Footguns
- RAID vs Backup vs Snapshot
- RHCE (EX294) — Footguns & Pitfalls
- RHCE (EX294) — Street Ops
- RabbitMQ Footguns
- RabbitMQ Operations - Street-Level Ops
- Redfish -- Footguns
- Redfish -- Street Ops
- Redis Footguns
- Redis Operations - Street-Level Ops
- Regex & Text Wrangling - Street-Level Ops
- Regex & Text Wrangling Footguns
- Reverse Proxy vs Load Balancer
- Runbook Craft - Street-Level Ops
- Runbook Craft Footguns
- Runtime Security with Falco Footguns
- Runtime Security with Falco — Street-Level Ops
- S3-Compatible Object Storage Footguns
- S3-Compatible Object Storage — Street-Level Ops
- SELinux & AppArmor - Street-Level Ops
- SELinux & AppArmor Footguns
- SELinux & Linux Hardening - Street-Level Ops
- SELinux & Linux Hardening Footguns
- SLO Tooling Footguns
- SLO Tooling — Street-Level Ops
- SQL Fundamentals Footguns
- SQL Fundamentals — Street-Level Ops
- SQLite Footguns
- SQLite Operations & Internals - Street-Level Ops
- SRE Practices - Street-Level Ops
- SRE Practices Footguns
- SSH Deep Dive
- SSH Deep Dive — Footguns
- SSH Deep Dive — Street-Level Ops
- Secrets Management - Street-Level Ops
- Secrets Management Footguns
- Security Footguns
- Service Mesh - Street-Level Ops
- Service Mesh Footguns
- Service vs Ingress (Kubernetes Networking)
- Storage Operations - Street-Level Ops
- Storage Operations Footguns
- Storage Stack: Disk, Partition, LVM, Filesystem, Mount
- Street ops
- Street ops
- Street ops
- Street ops
- Street ops
- Supply Chain Security - Street-Level Ops
- Supply Chain Security Footguns
- Synthetic Monitoring Footguns
- Synthetic Monitoring — Street-Level Ops
- Systemd Units: Unit, Service, Target, Start vs Enable
- Systems Thinking Footguns
- Systems Thinking for Engineers - Street-Level Ops
- TCP/IP Deep Dive - Street-Level Ops
- TCP/IP Deep Dive Footguns
- TLS & Certificates Ops - Street-Level Ops
- TLS & Certificates Ops Footguns
- Tailscale - Street-Level Ops
- Tailscale Footguns
- Terminal Internals
- Terminal Internals - Street Ops
- Terminal Internals Footguns
- Terraform Deep Dive - Footguns
- Terraform Deep Dive - Street Ops
- Terraform Footguns
- Terraform: Desired State Engine
- The Ops of AI/ML Workloads - Street-Level Ops
- The Psychology of Incidents - Street-Level Ops
- The Psychology of Incidents Footguns
- Trivia compendium
- Trivia compendium
- Trivia compendium
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
- VS Code Footguns
- VS Code for DevOps - Street Ops
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
- Virtualization - Street-Level Ops
- Virtualization Footguns
- WebAssembly for Infrastructure - Street-Level Ops
- WebAssembly for Infrastructure Footguns
- Wireshark / tshark / tcpdump - Street-Level Ops
- Wireshark / tshark / tcpdump Footguns
- YAML, JSON & Config Formats - Footguns
- YAML, JSON & Config Formats - Street Ops
- awk — Footguns
- awk — Street-Level Ops
- awk: The Record/Field Processor
- cert-manager Footguns
- cert-manager — Street-Level Ops
- cgroups & Linux Namespaces - Street Ops
- cgroups & Namespaces Footguns
- curl & wget
- curl & wget — Footguns
- curl & wget — Street-Level Ops
- eBPF & Modern Linux Observability - Street-Level Ops
- eBPF & Modern Linux Observability Footguns
- find - Footguns & Pitfalls
- find - Street-Level Ops
- gRPC - Street-Level Ops
- gRPC Footguns
- grep & Regular Expressions
- grep & Regular Expressions - Footguns
- grep & Regular Expressions - Street-Level Ops
- iptables & nftables
- iptables & nftables - Street-Level Ops
- iptables & nftables Footguns
- perf Profiling
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
- rsync - Street Ops
- rsync Footguns
- sed — Footguns
- sed — Street-Level Ops
- sed: The Stream Editor
- strace Footguns
- strace — Street-Level Ops
- systemctl & journalctl Footguns
- systemctl & journalctl Street Ops
- tar & Compression - Footguns
- tar & Compression - Street-Level Ops
- tmux & screen
- xargs - Footguns & Pitfalls
- xargs - Street Ops
tracing¶
- OpenTelemetry - Street-Level Ops
- OpenTelemetry Footguns
- Primer
- Primer
- Primer
- Tracing — Trivia & Interesting Facts
- perf Profiling Footguns
- perf Profiling — Street-Level Ops
- strace Footguns
- strace — Street-Level Ops
training¶
- Coverage Priorities
- How to Contribute
- Page Not Found
- Start Here
- Tools Reference
- Topic Buildout Process
- Welcome
transport¶
trivia¶
- AI DevOps Tools — Trivia & Interesting Facts
- AI/ML Ops — Trivia & Interesting Facts
- API Gateways — Trivia & Interesting Facts
- ARP — Trivia & Interesting Facts
- AWS EC2 — Trivia & Interesting Facts
- AWS IAM — Trivia & Interesting Facts
- AWS Lambda — Trivia & Interesting Facts
- AWS Networking — Trivia & Interesting Facts
- AWS Troubleshooting — Trivia & Interesting Facts
- Advanced Bash — Trivia & Interesting Facts
- Alerting Rules — Trivia & Interesting Facts
- Ansible — Trivia & Interesting Facts
- Argo CD & GitOps — Trivia & Interesting Facts
- Argo Workflows — Trivia & Interesting Facts
- Audit Logging — Trivia & Interesting Facts
- Azure Troubleshooting — Trivia & Interesting Facts
- BGP EVPN VXLAN — Trivia & Interesting Facts
- Backstage — Trivia & Interesting Facts
- Backup & Restore — Trivia & Interesting Facts
- Bare Metal Provisioning — Trivia & Interesting Facts
- Binary & Floats — Trivia & Interesting Facts
- CI/CD — Trivia & Interesting Facts
- CSS Fundamentals — Trivia & Interesting Facts
- Capacity Planning — Trivia & Interesting Facts
- Career Engineering — Trivia & Interesting Facts
- Ceph — Trivia & Interesting Facts
- Change Management — Trivia & Interesting Facts
- Chaos Engineering — Trivia & Interesting Facts
- Cilium — Trivia & Interesting Facts
- Cisco Fundamentals for DevOps — Trivia & Interesting Facts
- Claude Code — Trivia & Interesting Facts
- Cloud Deep Dive — Trivia & Interesting Facts
- Cloud Ops Basics — Trivia & Interesting Facts
- Compliance Automation — Trivia & Interesting Facts
- Container Base Images — Trivia & Interesting Facts
- Containers Deep Dive — Trivia & Interesting Facts
- Continuous Profiling — Trivia & Interesting Facts
- Corporate IT Fluency — Trivia & Interesting Facts
- CrashLoopBackOff — Trivia & Interesting Facts
- Cron Scheduling — Trivia & Interesting Facts
- Crossplane — Trivia & Interesting Facts
- DHCP & IPAM — Trivia & Interesting Facts
- DNS Operations — Trivia & Interesting Facts
- DNSSEC — Trivia & Interesting Facts
- DORA Metrics — Trivia & Interesting Facts
- Dagger — Trivia & Interesting Facts
- Database Internals — Trivia & Interesting Facts
- Database Operations — Trivia & Interesting Facts
- Datacenter — Trivia & Interesting Facts
- Debian & Ubuntu — Trivia & Interesting Facts
- Debugging Methodology — Trivia & Interesting Facts
- Dell PowerEdge — Trivia & Interesting Facts
- Disaster Recovery — Trivia & Interesting Facts
- Disk & Storage Ops — Trivia & Interesting Facts
- Distributed Systems — Trivia & Interesting Facts
- Docker — Trivia & Interesting Facts
- Edge & IoT — Trivia & Interesting Facts
- Elasticsearch — Trivia & Interesting Facts
- Email Infrastructure — Trivia & Interesting Facts
- Envoy Proxy — Trivia & Interesting Facts
- Falco — Trivia & Interesting Facts
- Feature Flags — Trivia & Interesting Facts
- FinOps — Trivia & Interesting Facts
- Firewalls — Trivia & Interesting Facts
- Firmware — Trivia & Interesting Facts
- Fleet Ops — Trivia & Interesting Facts
- GCP Troubleshooting — Trivia & Interesting Facts
- Git Advanced — Trivia & Interesting Facts
- Git — Trivia & Interesting Facts
- GitHub Actions — Trivia & Interesting Facts
- GitOps — Trivia & Interesting Facts
- HTTP Protocol — Trivia & Interesting Facts
- HashiCorp Vault — Trivia & Interesting Facts
- Helm — Trivia & Interesting Facts
- Homelab — Trivia & Interesting Facts
- IPMI & ipmitool — Trivia & Interesting Facts
- Incident Command — Trivia & Interesting Facts
- Incident Psychology — Trivia & Interesting Facts
- Incident Triage — Trivia & Interesting Facts
- Infrastructure Forensics — Trivia & Interesting Facts
- Infrastructure Testing — Trivia & Interesting Facts
- Inodes — Trivia & Interesting Facts
- Istio — Trivia & Interesting Facts
- Kafka — Trivia & Interesting Facts
- Kernel Troubleshooting — Trivia & Interesting Facts
- Knowledge Compendiums
- Kubernetes Debugging Playbook — Trivia & Interesting Facts
- Kubernetes Ecosystem — Trivia & Interesting Facts
- Kubernetes Networking — Trivia & Interesting Facts
- Kubernetes Node Lifecycle — Trivia & Interesting Facts
- Kubernetes Ops — Trivia & Interesting Facts
- Kubernetes RBAC — Trivia & Interesting Facts
- Kubernetes Storage — Trivia & Interesting Facts
- Kustomize — Trivia & Interesting Facts
- LACP — Trivia & Interesting Facts
- LDAP & Identity — Trivia & Interesting Facts
- LPIC & LFCS — Trivia & Interesting Facts
- Legacy Archaeology — Trivia & Interesting Facts
- Linux Boot Process — Trivia & Interesting Facts
- Linux Distro Comparison — Trivia & Interesting Facts
- Linux Hardening — Trivia & Interesting Facts
- Linux Logging — Trivia & Interesting Facts
- Linux Memory Management — Trivia & Interesting Facts
- Linux Ops Storage — Trivia & Interesting Facts
- Linux Ops — Trivia & Interesting Facts
- Linux Ops — systemd — Trivia & Interesting Facts
- Linux Performance — Trivia & Interesting Facts
- Linux Users and Permissions — Trivia & Interesting Facts
- Load Balancing — Trivia & Interesting Facts
- Load Testing — Trivia & Interesting Facts
- Log Pipelines — Trivia & Interesting Facts
- MTU — Trivia & Interesting Facts
- Make & Build Systems — Trivia & Interesting Facts
- Modern CLI Workflows — Trivia & Interesting Facts
- Modern CLI — Trivia & Interesting Facts
- MongoDB Operations — Trivia & Interesting Facts
- Monitoring Fundamentals — Trivia & Interesting Facts
- Monitoring Migration — Trivia & Interesting Facts
- Mounts & Filesystems — Trivia & Interesting Facts
- Multi-Tenancy — Trivia & Interesting Facts
- MySQL Operations — Trivia & Interesting Facts
- NAT — Trivia & Interesting Facts
- Network Automation — Trivia & Interesting Facts
- Networking Troubleshooting — Trivia & Interesting Facts
- Networking — Trivia & Interesting Facts
- Nix — Trivia & Interesting Facts
- Node Maintenance — Trivia & Interesting Facts
- OOMKilled — Trivia & Interesting Facts
- OPSEC Mistakes — Trivia & Interesting Facts
- Observability Deep Dive — Trivia & Interesting Facts
- Offensive Security Basics — Trivia & Interesting Facts
- Open Policy Agent — Trivia & Interesting Facts
- OpenTelemetry — Trivia & Interesting Facts
- OpenTofu — Trivia & Interesting Facts
- Ops War Stories — Trivia & Interesting Facts
- Package Management — Trivia & Interesting Facts
- Packer — Trivia & Interesting Facts
- Performance Profiling — Trivia & Interesting Facts
- Platform Engineering — Trivia & Interesting Facts
- Policy Engines — Trivia & Interesting Facts
- PostgreSQL — Trivia & Interesting Facts
- Postmortems & SLOs — Trivia & Interesting Facts
- Power — Trivia & Interesting Facts
- PowerShell — Trivia & Interesting Facts
- Process Management — Trivia & Interesting Facts
- Progressive Delivery — Trivia & Interesting Facts
- Pulumi — Trivia & Interesting Facts
- Python Debugging — Trivia & Interesting Facts
- Python Packaging — Trivia & Interesting Facts
- Python for Infrastructure — Trivia & Interesting Facts
- RHCE — Trivia & Interesting Facts
- RabbitMQ — Trivia & Interesting Facts
- Redfish — Trivia & Interesting Facts
- Redis — Trivia & Interesting Facts
- Regex & Text Wrangling — Trivia & Interesting Facts
- Routing — Trivia & Interesting Facts
- Runbook Craft — Trivia & Interesting Facts
- S3 & Object Storage — Trivia & Interesting Facts
- SELinux & AppArmor — Trivia & Interesting Facts
- SLO Tooling — Trivia & Interesting Facts
- SQL Fundamentals — Trivia & Interesting Facts
- SQLite — Trivia & Interesting Facts
- SRE Practices — Trivia & Interesting Facts
- SSH Deep Dive — Trivia & Interesting Facts
- STP — Trivia & Interesting Facts
- Secrets Management — Trivia & Interesting Facts
- Security Basics — Trivia & Interesting Facts
- Security Scanning — Trivia & Interesting Facts
- Server Hardware — Trivia & Interesting Facts
- Service Mesh — Trivia & Interesting Facts
- Shuffled Trivia Compendium
- Storage Ops — Trivia & Interesting Facts
- Subnetting & IP Addressing — Trivia & Interesting Facts
- Supply Chain Security — Trivia & Interesting Facts
- Synthetic Monitoring — Trivia & Interesting Facts
- Systems Thinking — Trivia & Interesting Facts
- TLS & Certificates — Trivia & Interesting Facts
- Tailscale — Trivia & Interesting Facts
- Terminal Internals — Trivia & Interesting Facts
- Terraform — Trivia & Interesting Facts
- Tracing — Trivia & Interesting Facts
- Trivia
- Trivia
- Trivia
- Trivia compendium
- Trivia compendium
- Trivia compendium
- VLANs — Trivia & Interesting Facts
- VPN & Tunneling — Trivia & Interesting Facts
- VS Code — Trivia & Interesting Facts
- Vendor Management — Trivia & Interesting Facts
- Virtualization — Trivia & Interesting Facts
- WebAssembly Infrastructure — Trivia & Interesting Facts
- Wireshark — Trivia & Interesting Facts
- awk — Trivia & Interesting Facts
- cert-manager — Trivia & Interesting Facts
- curl & wget — Trivia & Interesting Facts
- eBPF Observability — Trivia & Interesting Facts
- etcd — Trivia & Interesting Facts
- fd — Trivia & Interesting Facts
- find — Trivia & Interesting Facts
- fzf — Trivia & Interesting Facts
- gRPC — Trivia & Interesting Facts
- iptables & nftables — Trivia & Interesting Facts
- jq — Trivia & Interesting Facts
- nginx Web Servers — Trivia & Interesting Facts
- ripgrep — Trivia & Interesting Facts
- rsync — Trivia & Interesting Facts
- sed — Trivia & Interesting Facts
- strace — Trivia & Interesting Facts
- xargs — Trivia & Interesting Facts
tty¶
ubuntu¶
- Debian & Ubuntu — Footguns & Pitfalls
- Debian & Ubuntu — Street Ops
- Debian & Ubuntu — Trivia & Interesting Facts
- Primer
users-permissions¶
- Linux Users & Permissions
- Linux Users and Permissions — Footguns & Pitfalls
- Linux Users and Permissions — Street Ops
- Primer
vault¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Remediation: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
- Symptoms: Deployment Stuck, ImagePull Auth Failure, Fix Is Vault Secret Rotation
vendor-management¶
- Primer
- Vendor Management & Escalation - Street-Level Ops
- Vendor Management & Escalation Footguns
- Vendor Management — Trivia & Interesting Facts
vendor_management¶
version-control¶
- Git Advanced — Trivia & Interesting Facts
- Git Workflows & Branching Strategies
- Git — Trivia & Interesting Facts
virtualization¶
- Anti-Primer: Virtualization
- Primer
- Virtualization
- Virtualization - Street-Level Ops
- Virtualization Footguns
- Virtualization — Trivia & Interesting Facts
vlan¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Remediation: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
- Symptoms: Backup Job Failing, iSCSI Target Unreachable, Fix Is VLAN Config
vlans¶
- Anti-Primer: VLANs
- Case Study: Network Loop Broadcast Storm
- Cisco Fundamentals -- Street Ops
- Cisco Fundamentals Footguns
- Grading Checklist: Network Experiencing Broadcast Storm and High CPU on Switches
- Network Traps & Deep Debugging
- Networking - Street Ops
- Networking Footguns
- Primer
- Primer
- Questions: Network Experiencing Broadcast Storm and High CPU on Switches
- Scenario: VLAN Trunk Mismatch
- Skillcheck: Networking Fundamentals
- Solution: Network Experiencing Broadcast Storm and High CPU on Switches
- Symptoms: Network Experiencing Broadcast Storm and High CPU on Switches
- VLAN Footguns
- VLANs
- VLANs - Street-Level Ops
vmware¶
vpn¶
- Anti-Primer: VPN Tunneling
- Primer
- VPN & Tunneling
- VPN & Tunneling - Street-Level Ops
- VPN & Tunneling Footguns
vscode¶
- Anti-Primer: VS Code
- Primer
- VS Code Footguns
- VS Code for DevOps
- VS Code for DevOps - Street Ops
- VS Code — Trivia & Interesting Facts
vulnerability¶
- Diagnostic Questions
- Grading Rubric
- Investigation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Remediation: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
vxlan¶
war-story¶
- DNS: The Eternal Enemy
- From Monolith to Misery
- It Was Always DNS
- One Character from Disaster
- The 3 AM Cert Expiry
- The Auth System Swap
- The Autoscaler That Almost Bankrupted Us
- The Backup We Never Tested
- The CI/CD Pipeline Rewrite
- The Cascading Timeout
- The Case of the Missing Packets
- The Clock Skew Catastrophe
- The Cloud Bill Surprise
- The Config Management Lie
- The Container That Worked on My Machine
- The Cost of No Staging
- The DNS Provider Switch
- The Database Migration Weekend
- The Database That Wasn't Backing Up
- The Datacenter Exit
- The Deploy That Ate Prod
- The Documentation That Didn't Exist
- The Firewall Rule That Blocked Itself
- The Git Deploy That Deployed Nothing
- The Intern and the DROP TABLE
- The Kubernetes Migration That Took a Year
- The Leap Second Incident
- The Load Balancer Lie
- The Log That Filled the Disk
- The Memory Leak Marathon
- The Metrics That Lied
- The Monitoring Save
- The Monitoring We Ignored
- The Network Change Window
- The Observability Migration
- The Permissions Avalanche
- The Phantom Latency Spike
- The Postmortem Nobody Read
- The Rollback That Wasn't
- The SSL Handshake Timeout
- The Secret Rotation We Postponed
- The Secrets in the Repo
- The Single Point of Failure
- The Split-Brain Nightmare
- The Technical Debt Interest Payment
- The Terraform Plan That Would Have Destroyed Prod
- The Terraform State Disaster
- The Test We Never Wrote
- The Zombie Cron Job
- War Stories Collection
- When the Queue Backed Up
wasm-infrastructure¶
- WebAssembly Infrastructure — Trivia & Interesting Facts
- WebAssembly for Infrastructure - Street-Level Ops
- WebAssembly for Infrastructure Footguns
wasm_infrastructure¶
whats-new¶
windows¶
wireshark¶
- Anti-Primer: Wireshark
- Wireshark & Packet Analysis
- Wireshark / tshark / tcpdump - Street-Level Ops
- Wireshark / tshark / tcpdump Footguns
workflow-automation¶
workflows¶
xargs¶
yaml¶
yaml_json_config¶
- Anti-Primer: YAML JSON Config
- Anti-Primer: jq
- YAML, JSON & Config Formats
- jq
- jq - Street-Level Ops
- jq Footguns