Skip to content

Python The Complete Guide

  • lesson
  • python-basics
  • data-structures
  • file-i/o
  • error-handling
  • subprocess
  • pathlib
  • requests
  • boto3
  • paramiko
  • json/yaml
  • regex
  • argparse
  • logging
  • collections
  • concurrency
  • debugging
  • packaging
  • virtual-environments
  • jinja2
  • click
  • kubernetes-client
  • testing ---# Python — The Complete Guide: From Zero to Infrastructure Automation

Topics: Python basics, data structures, file I/O, error handling, subprocess, pathlib, requests, boto3, paramiko, JSON/YAML, regex, argparse, logging, collections, concurrency, debugging, packaging, virtual environments, Jinja2, Click, Kubernetes client, testing Strategy: Build-up from absolute zero (Bash comparison throughout) with war stories, trivia, and drills woven in Level: L0–L2 (Zero → Foundations → Operations) Time: 4–5 hours (designed for deep study in one or multiple sittings) Prerequisites: Familiarity with the Linux command line and Bash scripting. No prior Python experience required.


The Mission

You're a terminal-native engineer. You've been writing Bash for years — maybe decades. Your scripts work. They glue infrastructure together with pipes, awk, and sed. But your latest 500-line Bash monitoring script needs JSON parsing, retry logic with exponential backoff, parallel execution across 50 hosts, Slack webhook integration, and proper error handling. You stare at the tangled quoting and subshell variable scoping and realize: this is where Bash stops being the right tool.

By the end of this guide you'll be able to write Python scripts that replace your Bash toolkit, automate cloud infrastructure with real APIs, process data at scale, debug production issues, and build CLI tools your team will actually use. This is the one document you need to go from "I only know Bash" to "I can automate anything in Python."


Table of Contents

  1. Why Python? The Decision Line
  2. Running the Thing — REPL, Scripts, Shebangs
  3. Variables and Types — Not Everything Is a String
  4. f-Strings — printf That Doesn't Hate You
  5. Data Structures — Lists, Dicts, and Counter
  6. Conditionals, Loops, and Truthiness
  7. Functions — No More Subshell Surprises
  8. File I/O — open(), with, and No More Redirects
  9. Error Handling — Why Exceptions Beat Exit Codes
  10. String Methods — Your sed/awk/cut Replacement
  11. The Import System and Standard Library
  12. subprocess — The Escape Hatch to Shell
  13. pathlib — Files Without the Pain
  14. JSON and YAML — The Languages Infrastructure Speaks
  15. requests — The Better curl
  16. Regular Expressions — grep/sed in Python
  17. Logging — Structured Output for Production
  18. CLI Tools — argparse and Click
  19. boto3 — Automating AWS
  20. paramiko — SSH from Python
  21. Jinja2 — Templating Config Files
  22. Concurrency — Threading, Multiprocessing, asyncio
  23. Data Wrangling — Log Parsing at Scale
  24. Kubernetes Client — Automating K8s
  25. Debugging — From print() to pdb to Production
  26. Virtual Environments and Packaging
  27. Testing — pytest for Infrastructure Scripts
  28. Footguns — Mistakes That Turn Automation Into Liability
  29. Glossary
  30. Trivia and History
  31. Flashcard Review
  32. Drills
  33. Cheat Sheet
  34. Self-Assessment

Part 1: Why Python? The Decision Line

                BASH TERRITORY          |    PYTHON TERRITORY
                                        |
  One-liner file ops                    |    JSON/YAML/XML parsing
  Gluing 3–4 commands together          |    API calls with auth + retries
  Simple cron jobs                      |    Data structures beyond arrays
  Config file generation (heredocs)     |    Error handling with recovery
  Quick log tailing / grepping          |    Parallel execution
  Package install / service restart     |    Anything over ~100 lines
  Git hooks                             |    CSV/database operations
  Environment setup scripts             |    Unit tests / reusable libraries

The 100-Line Rule

If your Bash script passes 100 lines, ask: "Is this still glue, or is this logic?" Glue connects programs. Logic transforms data, makes decisions, handles errors. Bash is great glue. Bash is terrible logic.

Three Signals It's Time to Switch

  1. You're building data structures. declare -A and naming conventions like host_1_ip, host_1_port? Python's dicts and classes will save you hours.
  2. You're parsing structured data. If you're piping jq through awk back into jq, you're writing a bad Python script in Bash.
  3. You need error recovery, not just detection. set -e exits on failure. try/except catches specific errors, retries, falls back, logs context, and continues.

Mental Model: Bash is a text stream processor. Everything is a string. Every tool communicates via text piped between processes. Python is a data structure processor. You parse text into objects once, then work with real types — lists, dicts, integers, booleans. The moment your Bash script starts doing math on strings or building data structures with associative arrays, you've crossed the line into Python territory.

Etymology: Python was created by Guido van Rossum in 1991 and named after Monty Python's Flying Circus, not the snake. The language's design philosophy is captured in "The Zen of Python" (import this), which includes "Readability counts" and "There should be one — and preferably only one — obvious way to do it."


Part 2: Running the Thing

The REPL — Your New Scratch Terminal

# You already do this all day:
$ echo "hello"
hello

# Python has the same thing:
$ python3
>>> print("hello")
hello
>>> 2 + 2
4
>>> exit()

The >>> prompt is Python's interactive shell — the REPL (Read-Eval-Print Loop). It's your bash -c equivalent for testing one-liners.

Scripts and Shebangs

#!/usr/bin/env python3
print("I am a python script")

Same pattern as Bash: chmod +x script.py, run with ./script.py. The env trick finds whichever python3 is in your $PATH.

Gotcha: Outside a virtual environment, always use python3 — the bare python command is not uniform across Unix systems (per PEP 394, it may point to Python 2, Python 3, or not exist at all). Inside an activated virtual environment, python is fine and usually preferred — the venv guarantees it points to the correct interpreter. Python 2 reached end of life on January 1, 2020.

Quick One-Liners from the Shell

# Version check
python3 -c "import sys; print(sys.version)"

# Pretty-print JSON (no jq needed)
python3 -m json.tool < file.json

# Instant HTTP file server
python3 -m http.server 8000

# Generate a random password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

# Base64 encode
python3 -c "import base64; print(base64.b64encode(b'secret').decode())"

# Check if a module is installed
python3 -c "import boto3; print(boto3.__version__)"

Part 3: Variables and Types

In Bash, everything is a string. In Python, data has types:

name = "webserver"       # str (string)
count = 42               # int (integer)
uptime = 99.7            # float (decimal)
is_running = True        # bool (boolean — capital T)
last_error = None        # NoneType (like null — "nothing here")

No $ prefix. No quoting disasters. No declare -i. The value itself tells Python what type it is.

Type Conversion — Explicit Is Better

port = "8080"          # String from a config file
port_int = int(port)   # Now it's an integer
print(port_int + 1)    # 8081

int("not_a_number")    # ValueError — Python yells immediately

Python yells at you instead of silently doing the wrong thing. This is a feature.

Mental Model: Bash is text-in, text-out. You convert between types by piping strings through commands (bc, awk, printf). In Python, data carries its type with it, and you convert explicitly with int(), str(), float().


Part 4: f-Strings

String formatting in Bash is a minefield of quoting rules. Python f-strings are the fix:

host = "web-03"
port = 8080
print(f"Connecting to {host}:{port}")                     # Variables
print(f"Status: {port + 1}")                              # Expressions
print(f"{'HOST':<20} {'PORT':>5}")                        # Alignment
print(f"Uptime: {99.734:.1f}%")                           # Decimal places
print(f"Size: {1048576:,} bytes")                         # Thousands separator

Anything inside {} in an f-string is a Python expression. Variables, math, function calls — all valid. No more escaping nested quotes inside $() inside double quotes inside heredocs.

History: f-strings were introduced in Python 3.6 (2016) via PEP 498. The older formats — % formatting (from C's printf) and .format() — still work but are more verbose. f-strings won because they put the value right next to where it appears.


Part 5: Data Structures

Lists — Arrays That Actually Work

servers = ["web-01", "web-02", "db-01"]
print(len(servers))               # 3
print(servers[0])                 # "web-01"
print(servers[-1])                # "db-01" (negative indexing!)
servers.append("cache-01")        # Append

# Filter (list comprehension)
high_ports = [p for p in [80, 443, 8080, 9090] if p > 1024]  # [8080, 9090]

# Check membership
if 443 in [80, 443, 8080]:
    print("HTTPS is configured")

List comprehensions ([x for x in items if condition]) are Python's pipeline filters — they read left to right, same as cmd | grep condition.

Dicts — The Data Structure That Replaces Half Your Scripts

counts = {
    "sshd": 47,
    "cron": 12,
    "kernel": 8,
}

print(counts["sshd"])              # 47
print(counts.get("nginx", 0))     # 0 (default if missing — no crash)

for program, count in counts.items():
    print(f"{program}: {count}")

# Sort by value
for prog, n in sorted(counts.items(), key=lambda x: x[1], reverse=True):
    print(f"{prog:<15} {n:>5}")

Key insight: dict.get(key, default) returns the default instead of crashing on a missing key. This eliminates the most common dict-related bug.

Counter — The awk Killer

from collections import Counter

# The Bash way:
# awk '{print $5}' /var/log/syslog | cut -d'[' -f1 | sort | uniq -c | sort -rn

# The Python way:
counts = Counter()
with open("/var/log/syslog") as f:
    for line in f:
        parts = line.split()
        if len(parts) >= 5:
            program = parts[4].split("[")[0].rstrip(":")
            counts[program] += 1

for program, n in counts.most_common(20):
    print(f"{program:<20} {n:>6}")

Five Bash commands piped together → one Python data structure. And counts is an object you can query, filter, serialize to JSON, or combine with another Counter. The Bash pipeline gave you printed text. Python gave you data.

War Story: A team had a 120-line Bash script monitoring log volume across 15 services. It used nested for loops, three declare -A arrays, and had 9 bugs related to uninitialized array keys (Bash returns empty strings for missing keys, which silently breaks arithmetic). The Python rewrite used {service: Counter()} and was 40 lines. The three hardest bugs in the Bash version were impossible in Python because Counter() initializes missing keys to zero automatically.

defaultdict — Auto-Initializing Dicts

from collections import defaultdict

# Group log lines by status code
lines_by_status = defaultdict(list)
with open('access.log') as f:
    for line in f:
        status = line.split()[8]
        lines_by_status[status].append(line.rstrip())

# No "if key not in dict" boilerplate needed
print(f"Unique 500 errors: {len(lines_by_status['500'])}")

Sets — Fast Membership Tests

seen = {"web-01", "web-02"}
if "web-01" in seen:
    print("duplicate")
# Sets are O(1) lookup — use for membership tests and deduplication

# Dedup a list while preserving order (Python 3.7+)
hosts = ["web-01", "db-01", "web-01", "cache-01"]
unique = list(dict.fromkeys(hosts))  # ["web-01", "db-01", "cache-01"]

Tuples — Immutable Lists

# Tuples can't be changed after creation (immutable)
point = (10, 20)
host_port = ("web-01", 8080)

# Tuple unpacking — use instead of $1, $2
host, port = host_port
print(f"{host}:{port}")

# Multiple return values from functions
total, errors, warnings = analyze_log("/var/log/syslog")

Part 6: Control Flow

Conditionals — No More Bracket Roulette

if status == "running":
    print("Service is up")
elif count > 10:
    print("Above threshold")
else:
    print("Something else")

No then. No fi. No semicolons. No brackets. Indentation defines the block.

Use is None for None checks, not == None. is tests identity, == tests equality. None is a singleton, so is is both faster and semantically correct:

result = get_server_status()
if result is None:       # Correct
    print("No response")
if result is not None:   # Correct negation
    print(f"Got: {result}")

Truthiness

# These are "falsy" (treated as False in if statements)
False, 0, 0.0, "", [], {}, None

# Everything else is "truthy"
# So you can write:
if servers:             # True if list is non-empty
    print("We have servers")
if name:                # True if string is non-empty
    print(f"Hello, {name}")

Loops

# Iterate a list
for server in ["web-01", "web-02", "db-01"]:
    print(f"Checking {server}")

# Range
for i in range(10):      # 0 through 9
    print(i)

# enumerate — index + value (no manual counter)
for i, server in enumerate(servers):
    print(f"{i}: {server}")

# zip — walk two lists in lockstep
for host, port in zip(["web-01", "db-01"], [80, 5432]):
    print(f"{host}:{port}")

enumerate() and zip() are the two loop tools you'll use most.

File Reading

with open("/var/log/syslog") as f:
    for line in f:
        print(line, end="")     # end="" because line already has \n

The with keyword is a context manager — it automatically closes the file when you're done, even if an error occurs. It's Python's version of trap cleanup EXIT, except scoped to one block.


Part 7: Functions

Bash functions return exit codes (0–255) and communicate via stdout. Python functions return actual values:

def get_disk_usage(path="/"):
    import shutil
    total, used, free = shutil.disk_usage(path)
    return round(used / total * 100, 1)    # Returns a float, not a string

usage = get_disk_usage()          # No subshell. No text parsing.
print(f"Usage: {usage}%")

# Named arguments with defaults
def check_port(host, port, timeout=3):
    import socket
    try:
        sock = socket.create_connection((host, port), timeout=timeout)
        sock.close()
        return True
    except (ConnectionRefusedError, TimeoutError):
        return False

# Multiple return values
def analyze_log(path):
    errors = warnings = total = 0
    with open(path) as f:
        for line in f:
            total += 1
            if "ERROR" in line: errors += 1
            elif "WARN" in line: warnings += 1
    return total, errors, warnings

total, errors, warnings = analyze_log("/var/log/syslog")
Feature Bash Python
Return data echo + capture with $() return value
Return type Always a string (stdout) Any type: int, str, list, dict, bool
Arguments Positional: $1, $2 Named, with defaults
Scope Global by default (need local) Local by default
Multiple returns Impossible return a, b, c (tuple)

Under the Hood: In Bash, result=$(my_function) runs the function in a subshell. Variable changes inside $() disappear when the subshell exits. This is the #1 source of "why didn't my variable update?" bugs. Python functions share the same process and return data directly.

Type Hints and Dataclasses

# Type hints improve readability and tooling
def classify_load(value: float) -> str:
    if value >= 10:
        return "critical"
    if value >= 5:
        return "warning"
    return "ok"

# Dataclasses for structured data
from dataclasses import dataclass, field

@dataclass
class Host:
    name: str
    address: str
    port: int = 22
    tags: list[str] = field(default_factory=list)

Use type hints on function boundaries first — that gets most of the value. Use dataclasses when the data shape matters (named fields, sane defaults, fewer typo bugs).

Mutable Default Arguments

# BAD: mutable default argument (shared across all calls!)
def add_host(name, tags=[]):
    tags.append(name)
    return tags

# GOOD: use None and create fresh
def add_host(name: str, tags: list[str] | None = None) -> list[str]:
    tags = [] if tags is None else tags
    tags.append(name)
    return tags

The default [] is created once at function definition time and shared across every call. Call add_host("a") then add_host("b") and the second call returns ["a", "b"]. This is one of Python's most common footguns.


Part 8: File I/O

# Read entire file
with open("/etc/hostname") as f:
    content = f.read().strip()

# Process line by line (memory-efficient for huge files)
with open("/etc/passwd") as f:
    for line in f:
        parts = line.strip().split(":")
        print(f"{parts[0]:<20} {parts[-1]}")

# Write to a file
with open("/tmp/config.txt", "w") as f:
    f.write("server=web-03\n")
    f.write("port=8080\n")
Mode Meaning Bash Equivalent
"r" Read (default) < file
"w" Write (truncates!) > file
"a" Append >> file

Gotcha: "w" mode truncates the file immediately on open — before you write anything. For atomic writes, write to a temp file then rename it (same pattern as safe config updates in Bash).

Atomic File Writes

import tempfile
from pathlib import Path

def atomic_write(path, content):
    """Write content to file atomically — safe for config files."""
    path = Path(path)
    fd, tmp_path = tempfile.mkstemp(dir=path.parent, suffix='.tmp')
    try:
        Path(tmp_path).write_text(content)
        Path(tmp_path).rename(path)      # Atomic on same filesystem
    except Exception:
        Path(tmp_path).unlink(missing_ok=True)
        raise

Part 9: Error Handling

In Bash, error handling is set -e and prayer. Python has try/except — structured, specific, and reliable:

try:
    with open("/var/log/syslog") as f:
        content = f.read()
except FileNotFoundError:
    print("Syslog not found (are you on macOS?)")
    content = ""
except PermissionError:
    print("Permission denied — run with sudo?")
    content = ""

# Catch, log, and continue (what set -e can't do)
results = {}
for server in ["web-01", "web-02", "web-03"]:
    try:
        results[server] = check_server(server)
    except ConnectionError as e:
        print(f"WARN: {server} unreachable: {e}")
        results[server] = None
        # Script continues instead of dying

The Exception Hierarchy You Need

BaseException
 └── Exception
      ├── FileNotFoundError     # File doesn't exist
      ├── PermissionError       # Can't read/write
      ├── ValueError            # Wrong value (int("abc"))
      ├── KeyError              # Dict key doesn't exist
      ├── IndexError            # List index out of range
      ├── TypeError             # Wrong type (len(42))
      ├── ConnectionError       # Network failure
      ├── TimeoutError          # Operation timed out
      └── KeyboardInterrupt     # Ctrl+C

try / except / else / finally

try:
    f = open("/var/log/syslog")
    data = f.read()
except FileNotFoundError:
    print("File missing")
    data = ""
else:
    # Only runs if NO exception occurred
    print(f"Read {len(data)} bytes")
finally:
    # ALWAYS runs — cleanup goes here
    print("Done")

block/rescue Equivalent

# Ansible-style block/rescue pattern
try:
    deploy_application()
    verify_health()
except Exception:
    rollback_application()
    notify_team("Deploy failed, rolled back")
finally:
    log_deployment_attempt()

Mental Model: set -e is a fire alarm — when something goes wrong, everybody evacuates. try/except is a fire extinguisher — you identify what's burning, put it out, and keep working.


Part 10: String Methods

Python strings have methods that replace most sed/awk/cut one-liners:

line = "  Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password  "

line.strip()                          # Remove leading/trailing whitespace
line.split()                          # Split on whitespace (like awk default)
line.split(":")                       # Split on colons (like cut -d':')
", ".join(["web-01", "web-02"])       # "web-01, web-02"
"sshd[28410]:".startswith("sshd")     # True
"Hello World".replace("World", "Ops") # "Hello Ops"
"Failed password" in line             # True (substring check)
"warning".upper()                     # "WARNING"

# Chaining — extract program name from syslog line
# Bash: echo "$line" | awk '{print $5}' | cut -d'[' -f1 | tr -d ':'
program = line.split()[4].split("[")[0].rstrip(":")
# "sshd"

One line, no pipes, no subprocesses, and it returns a Python string you can use directly in a dict, comparison, or f-string.

Fact: Python strings are immutable — every method returns a new string. The original is unchanged. This means you can never accidentally corrupt data by modifying it in two places at once.


Part 11: Imports and the Standard Library

# Standard library — ships with Python, no install required
import os                    # OS interactions (env vars, paths, PIDs)
import sys                   # Python runtime (args, exit, stdin/stdout)
import json                  # JSON parsing/writing
import re                    # Regular expressions
import subprocess            # Run shell commands
import datetime              # Dates and times
import collections           # Counter, defaultdict
import socket                # Low-level networking
import shutil                # File copying, disk usage
import csv                   # CSV file reading/writing
import hashlib               # Hashing (md5, sha256)
import argparse              # CLI argument parsing
import logging               # Structured logging
import tempfile              # Temporary files
from pathlib import Path     # File path operations

# Third-party — install with pip
import requests              # HTTP requests (better curl)
import yaml                  # YAML parsing
import boto3                 # AWS SDK

"Batteries included" — Python's standard library ships with modules for nearly everything. The phrase was coined in the late 1990s. On a server where you can't install packages, you still have json, csv, re, pathlib, subprocess, logging, and more.

The Main Guard

#!/usr/bin/env python3

def main():
    print("This only runs when executed directly")

if __name__ == "__main__":
    main()

When Python runs a file directly, __name__ is "__main__". When imported as a module, it's the module name. This lets your file work as both a script and a reusable library.


Part 12: subprocess — The Escape Hatch

import subprocess

# Simple command
result = subprocess.run(
    ["df", "-h", "/"],
    capture_output=True,
    text=True,
    check=True,          # Raise on non-zero exit
)
print(result.stdout)

# Parse JSON output from a command
result = subprocess.run(
    ["kubectl", "get", "pods", "-o", "json"],
    capture_output=True, text=True, check=True,
)
pods = json.loads(result.stdout)

# With timeout
result = subprocess.run(
    ["helm", "list"],
    capture_output=True, text=True, timeout=30,
)

# Stream output in real time
process = subprocess.Popen(
    ["ansible-playbook", "site.yml"],
    stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
)
for line in process.stdout:
    print(line, end='')

The shell=True Footgun

# NEVER DO THIS with user input:
subprocess.run(f"ping -c 1 {hostname}", shell=True)  # Shell injection!
# What if hostname is "google.com; rm -rf /"?

# SAFE: pass arguments as a list
subprocess.run(["ping", "-c", "1", hostname])

When to Use subprocess vs Native Python

Use subprocess Use native Python
systemctl restart nginx Parsing a file (open())
iptables -L HTTP requests (requests)
docker ps JSON parsing (json)
git log --oneline File operations (pathlib)
aws CLI (quick one-offs) String matching (re)

Gotcha: The most common mistake when Bash experts start writing Python is calling subprocess.run() for everything. If you're writing subprocess.run(["grep", "ERROR", logfile]), you're paying Python's overhead without getting its benefit. Use for line in open(logfile) if "ERROR" in line.


Part 13: pathlib — Files Without the Pain

from pathlib import Path

# Path manipulation (no more os.path.join())
config = Path("/etc/myapp") / "conf.d" / "upstream.yaml"
backup = config.with_suffix(".yaml.bak")

# Properties
config.parent     # PosixPath('/etc/myapp/conf.d')
config.name       # 'upstream.yaml'
config.stem       # 'upstream'
config.suffix     # '.yaml'

# Check existence
config.exists()
config.is_file()
config.is_dir()

# Read/write
content = config.read_text()
config.write_text("new content")

# Create directories
Path("/backup/myapp").mkdir(parents=True, exist_ok=True)

# Find files (like find command)
for log in Path("/var/log").glob("*.log"):
    size_mb = log.stat().st_size / (1024 * 1024)
    if size_mb > 100:
        print(f"Large log: {log} ({size_mb:.1f} MB)")

# Recursive glob
for yaml_file in Path("/etc").rglob("*.yaml"):
    print(yaml_file)

Trivia: The / operator for paths was added in Python 3.4 (2014). It works by overriding __truediv__, the same method that handles a / b for numbers.


Part 14: JSON and YAML

JSON

import json

# Parse from string
data = json.loads('{"status": "healthy", "uptime": 84600}')

# Parse from file
with open("response.json") as f:
    data = json.load(f)

# Write (pretty-printed)
print(json.dumps(data, indent=2))

# Navigate nested structures
pod_name = data["metadata"]["name"]
node = data["status"].get("hostIP", "unknown")  # Safe with default

YAML

import yaml  # pip install pyyaml

# Read a Kubernetes manifest
with open("deployment.yaml") as f:
    manifest = yaml.safe_load(f)

# Multi-document YAML (multiple --- separated docs)
with open("all-resources.yaml") as f:
    for doc in yaml.safe_load_all(f):
        if doc:
            print(f"{doc.get('kind')}: {doc['metadata']['name']}")

Security: Always use yaml.safe_load(), never yaml.load(). The unsafe version can execute arbitrary Python code embedded in YAML. This is a known attack vector, not a theoretical risk. If you see yaml.load(f) without a Loader argument, that's a security bug.

YAML's Type Surprises

gotchas = yaml.safe_load("""
norway_code: NO       # boolean False (YAML 1.1!)
version: 1.10         # float 1.1 (trailing zero dropped!)
port: 8080            # integer (fine)
""")
# "NO" becomes False, "1.10" becomes 1.1
# Always quote strings that could be misinterpreted

The Norway Problem: Country code NO being parsed as boolean False has caused real deployment failures. YAML 1.2 fixed this, but PyYAML still implements YAML 1.1.

TOML

Python 3.11+ includes tomllib in the standard library for reading TOML files (used by pyproject.toml):

import tomllib

with open("pyproject.toml", "rb") as f:
    config = tomllib.load(f)
print(config["project"]["name"])

Note: tomllib is read-only. If you need to write TOML, use the third-party tomli-w package.


Part 15: requests — The Better curl

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Basic GET
response = requests.get("http://api.internal:8080/health", timeout=10)
print(response.status_code)   # 200
print(response.json())        # Parsed JSON as a dict

# POST with JSON body
response = requests.post(
    "https://api.example.com/deploy",
    json={"version": "v2.0", "env": "prod"},
    headers={"Authorization": "Bearer mytoken"},
    timeout=10,
)
response.raise_for_status()   # Raises HTTPError for 4xx/5xx

Sessions with Retries — The Non-Negotiable Pattern

def get_session(retries=3, backoff_factor=0.5, timeout=10):
    """Create a requests session with automatic retries."""
    session = requests.Session()
    retry = Retry(
        total=retries,
        backoff_factor=backoff_factor,     # 0.5s, 1s, 2s between retries
        status_forcelist=[500, 502, 503, 504],
        allowed_methods=["GET", "HEAD"],
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

session = get_session()
data = session.get("http://prometheus.internal:9090/api/v1/targets", timeout=10).json()

War Story: A monitoring script checked 40 endpoints every 60 seconds with no retry logic. During a 3-second 502 from a routine deploy, it fired 40 "service down" alerts to Slack and paged the on-call at 3 AM. After adding retries with 2-second backoff, false alerts dropped 90%.

Gotcha: requests.get(url) with no timeout blocks forever if the server doesn't respond. Your cron job piles up. You now have 30 zombie Python processes. Always set timeout=.


Part 16: Regular Expressions

import re

# Simple match (like grep)
if re.search(r"Failed password", line):
    print("SSH failure detected")

# Extract groups (like sed capture groups)
match = re.search(r"from (\d+\.\d+\.\d+\.\d+) port (\d+)", line)
if match:
    ip = match.group(1)
    port = match.group(2)

# Find all matches
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", log_content)

# Replace (like sed 's/old/new/g')
cleaned = re.sub(r"\s+", " ", messy_text)

# Compile for performance (reuse the pattern)
LOG_PATTERN = re.compile(r"^(\S+) - - \[(.+?)\] \"(\S+) (\S+)")
for line in open("access.log"):
    match = LOG_PATTERN.match(line)
    if match:
        ip, timestamp, method, path = match.groups()

Part 17: Logging

import logging
import sys

def setup_logging(verbose=False):
    level = logging.DEBUG if verbose else logging.INFO
    logging.basicConfig(
        level=level,
        format='%(asctime)s %(levelname)s %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
        handlers=[logging.StreamHandler(sys.stderr)],
    )
    return logging.getLogger(__name__)

log = setup_logging()
log.info("Starting backup for %d hosts", len(hosts))
log.warning("Host %s unreachable: %s", host, error)
log.error("Backup failed: %s", str(e))

JSON Logging (for Monitoring Pipelines)

import json
from datetime import datetime

def log_json(event, **kwargs):
    entry = {'event': event, 'timestamp': datetime.now(datetime.UTC).isoformat()}
    entry.update(kwargs)
    print(json.dumps(entry), file=sys.stderr)

log_json("backup_complete", hosts=5, duration_seconds=142)

Note: datetime.utcnow() is deprecated in Python 3.12+. Use datetime.now(datetime.UTC) instead — it returns a timezone-aware datetime, which prevents a whole class of "naive vs aware" comparison bugs.


Part 18: CLI Tools

argparse (Standard Library)

#!/usr/bin/env python3
"""Morning infrastructure health check."""
import argparse
import os

def main():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('--env', required=True, choices=['dev', 'staging', 'prod'])
    parser.add_argument('--dry-run', action='store_true')
    parser.add_argument('--verbose', '-v', action='count', default=0)
    parser.add_argument('--timeout', type=int,
                        default=int(os.environ.get('TIMEOUT', '30')))
    args = parser.parse_args()
    # args.env, args.dry_run, args.verbose, args.timeout

if __name__ == '__main__':
    main()

Click (Third-Party, More Powerful)

import click

@click.group()
@click.option('--verbose', '-v', is_flag=True)
@click.pass_context
def cli(ctx, verbose):
    """Infrastructure management tool."""
    ctx.ensure_object(dict)
    ctx.obj['verbose'] = verbose

@cli.command()
@click.argument('environment', type=click.Choice(['dev', 'staging', 'prod']))
@click.option('--region', '-r', default='us-east-1')
def list_servers(environment, region):
    """List servers in an environment."""
    servers = get_instances_by_tag('Environment', environment)
    for s in servers:
        click.echo(f"{s['id']:<22} {s['ip']:<16} {s['type']}")

@cli.command()
@click.argument('instance_id')
@click.confirmation_option(prompt='Stop this instance?')
def stop(instance_id):
    """Stop an EC2 instance."""
    stop_instance(instance_id)

Configuration Precedence Pattern

CLI flags  →  override  →  Environment variables  →  override  →  Config file  →  Defaults
(highest)                                                                        (lowest)

This matches how every serious CLI tool (kubectl, aws, terraform) works.


Part 19: boto3 — Automating AWS

import boto3
from botocore.exceptions import ClientError

ec2 = boto3.client('ec2', region_name='us-east-1')
s3 = boto3.client('s3')

# List instances by tag
def get_instances_by_tag(tag_key, tag_value):
    response = ec2.describe_instances(
        Filters=[
            {'Name': f'tag:{tag_key}', 'Values': [tag_value]},
            {'Name': 'instance-state-name', 'Values': ['running']},
        ]
    )
    instances = []
    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            instances.append({
                'id': instance['InstanceId'],
                'ip': instance.get('PrivateIpAddress'),
                'type': instance['InstanceType'],
            })
    return instances

# CRITICAL: Paginate all list operations
def list_all_s3_objects(bucket, prefix=''):
    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get('Contents', []):
            yield obj['Key'], obj['Size']

# Error handling
try:
    ec2.stop_instances(InstanceIds=[instance_id])
except ClientError as e:
    if e.response['Error']['Code'] == 'InvalidInstanceID.NotFound':
        print(f"Instance {instance_id} not found")
    else:
        raise

Gotcha: boto3 reads credentials in this order: (1) explicit parameters, (2) environment variables, (3) ~/.aws/credentials, (4) EC2 instance metadata / ECS task role. Never hardcode credentials in code.

Gotcha: AWS APIs return at most 100-1000 results per call. If you have 5,000 instances and don't paginate, you only see the first 1,000. Use paginators for every describe_*, list_*, get_* call.

Trivia: boto3 is the most-used AWS SDK in any language, with over 1 billion downloads per month from PyPI.


Part 20: paramiko — SSH from Python

import paramiko

def run_remote_command(host, user, key_path, command):
    """Run a command on a remote host via SSH."""
    client = paramiko.SSHClient()
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    try:
        client.connect(hostname=host, username=user,
                       key_filename=key_path, timeout=10)
        stdin, stdout, stderr = client.exec_command(command, timeout=30)
        exit_code = stdout.channel.recv_exit_status()
        return {
            'host': host,
            'stdout': stdout.read().decode().strip(),
            'stderr': stderr.read().decode().strip(),
            'exit_code': exit_code,
        }
    finally:
        client.close()   # Always close, even on exception

Gotcha: If you don't close paramiko connections on exceptions, after 200 hosts you hit the file descriptor limit. Always use try/finally.

Security: In production, avoid AutoAddPolicy() — it accepts any host key without verification, making you vulnerable to MITM attacks. Use RejectPolicy() or WarningPolicy() and manage known hosts properly (e.g., client.load_system_host_keys()).


Part 21: Jinja2 — Templating Config Files

from jinja2 import Template, Environment, FileSystemLoader

# Inline template
tmpl = Template("Hello {{ name }}")
print(tmpl.render(name="world"))

# From files
env = Environment(loader=FileSystemLoader('templates/'),
                  trim_blocks=True, lstrip_blocks=True)
tmpl = env.get_template('nginx.conf.j2')
config = tmpl.render(
    service_name='myapp',
    backends=[
        {'ip': '10.0.1.10', 'port': 8080, 'weight': 100},
        {'ip': '10.0.1.11', 'port': 8080, 'weight': 100},
    ],
    domain='app.example.com',
)

# K8s manifest generation
K8S_TEMPLATE = Template("""
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ name }}
spec:
  replicas: {{ replicas }}
  template:
    spec:
      containers:
      - name: {{ name }}
        image: {{ image }}:{{ tag }}
""")

for svc in services:
    print("---")
    print(K8S_TEMPLATE.render(**svc))

Part 22: Concurrency

The GIL (Global Interpreter Lock)

The GIL allows only one thread to execute Python bytecode at a time. But it's released during I/O (network, disk, sleep). So:

Workload GIL Impact Right Tool
I/O-bound (HTTP, SSH, file I/O) Minimal threading or asyncio
CPU-bound (math, parsing) Severe — threads give zero speedup multiprocessing

Key fact for DevOps: Infrastructure scripts are almost always I/O-bound. The GIL does not matter for your work. Threading works great.

ThreadPoolExecutor — Parallel Fleet Operations

from concurrent.futures import ThreadPoolExecutor, as_completed

def check_host_health(host):
    try:
        resp = requests.get(f'http://{host}:8080/health', timeout=5)
        return {'host': host, 'healthy': resp.ok}
    except requests.exceptions.RequestException as e:
        return {'host': host, 'healthy': False, 'error': str(e)}

def parallel_health_check(hosts, max_workers=20):
    results = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_host = {
            executor.submit(check_host_health, host): host
            for host in hosts
        }
        for future in as_completed(future_to_host):
            results.append(future.result())
    return results

# 200 hosts checked in parallel — seconds instead of minutes
results = parallel_health_check(all_hosts, max_workers=30)
unhealthy = [r for r in results if not r['healthy']]

Fleet Operation Pattern

def fleet_operation(hosts, operation, max_workers=20, fail_fast=False):
    """Run an operation across a fleet of hosts in parallel."""
    results = {'success': [], 'failed': []}
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(operation, host): host for host in hosts}
        for future in as_completed(futures):
            host = futures[future]
            try:
                result = future.result()
                results['success'].append({'host': host, 'result': result})
            except Exception as e:
                results['failed'].append({'host': host, 'error': str(e)})
                if fail_fast:
                    executor.shutdown(wait=False, cancel_futures=True)
                    break
    return results

Part 23: Data Wrangling — Log Parsing at Scale

Reading Large Files Without Killing the Server

# BAD: loads 800 MB into RAM (plus ~3x object overhead)
lines = open('access.log').readlines()

# GOOD: streams line by line — constant memory
with open('access.log') as f:
    for line in f:
        process(line)

# Gzipped files
import gzip
with gzip.open('access.log.gz', 'rt', errors='replace') as f:
    for line in f:
        process(line)

Multi-Aggregation in One Pass

from collections import Counter
from pathlib import Path
import gzip

ip_counts = Counter()
status_counts = Counter()
endpoint_counts = Counter()

for log_file in sorted(Path('/var/log/nginx').glob('access.log.*.gz')):
    with gzip.open(log_file, 'rt', errors='replace') as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 10:
                ip_counts[parts[0]] += 1
                status_counts[parts[8]] += 1
                endpoint_counts[parts[6]] += 1

# Three aggregations, one pass, constant memory
print("Top IPs:", ip_counts.most_common(10))
print("Status codes:", status_counts.most_common())

Generators for Memory-Efficient Pipelines

def grep_file(filepath, pattern):
    """Memory-efficient file search — like grep but returns structured data."""
    with open(filepath) as f:
        for line_num, line in enumerate(f, 1):
            if pattern in line:
                yield line_num, line.rstrip()

# Lazy evaluation — only processes what's needed
for num, line in grep_file("/var/log/syslog", "ERROR"):
    print(f"{num}: {line}")

Part 24: Kubernetes Client

from kubernetes import client, config  # pip install kubernetes

# Load kubeconfig
try:
    config.load_incluster_config()       # Inside a pod
except config.ConfigException:
    config.load_kube_config()            # From ~/.kube/config

v1 = client.CoreV1Api()

# List pods
pods = v1.list_namespaced_pod('default')
for p in pods.items:
    print(f"{p.metadata.name}: {p.status.phase}")

# Find CrashLoopBackOff pods across all namespaces
all_pods = v1.list_pod_for_all_namespaces()
for pod in all_pods.items:
    for cs in (pod.status.container_statuses or []):
        waiting = cs.state.waiting
        if waiting and waiting.reason == 'CrashLoopBackOff':
            print(f"  {pod.metadata.namespace}/{pod.metadata.name} "
                  f"({cs.name}) - {cs.restart_count} restarts")

Part 25: Debugging

print(f"DEBUG: variable = {variable!r}")  # !r shows type info

breakpoint() — The Built-In Debugger

def process_data(records):
    for record in records:
        breakpoint()  # Drops into pdb interactive debugger
        transform(record)

Essential pdb Commands

Command Short What It Does
next n Execute next line (step over)
step s Step into function call
continue c Continue until next breakpoint
print expr p expr Print expression value
list l Show source code around current line
where w Show call stack
quit q Quit debugger
# Disable all breakpoints via environment
PYTHONBREAKPOINT=0 python3 script.py

# Use ipdb instead of pdb (better UI)
PYTHONBREAKPOINT=ipdb.set_trace python3 script.py

Remote Debugging (Docker, Production)

import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger...")
debugpy.wait_for_client()

Profiling

# Find slow functions
python3 -m cProfile -s cumtime myscript.py 2>&1 | head -30

# Check syntax without running
python3 -m py_compile myscript.py

Part 26: Virtual Environments and Packaging

Virtual Environments

# Create
python3 -m venv .venv

# Activate
source .venv/bin/activate

# Install packages (goes into .venv only)
pip install requests boto3

# Freeze dependencies
pip freeze > requirements.txt

# Deactivate
deactivate

A venv is just a directory. Delete it and you're clean. Never commit .venv/ to git.

requirements.txt

# Direct dependencies (loose — for libraries)
requests>=2.28
flask>=3.0

# Pinned (reproducible — for applications)
requests==2.31.0
flask==3.0.2
Werkzeug==3.0.1

pip-tools — The Better Way

pip install pip-tools

# requirements.in — what you WANT
cat requirements.in
# requests>=2.28
# flask>=3.0

# Compile to pinned requirements.txt — what you GET
pip-compile requirements.in

# Install exactly what's pinned
pip-sync requirements.txt

pyproject.toml (Modern Standard)

[project]
name = "my-infra-tool"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
    "requests>=2.28",
    "boto3>=1.28",
    "click>=8.0",
]

[project.scripts]
infra-check = "my_tool.cli:main"

uv — The Future

# uv is a Rust-based replacement for pip, pip-tools, virtualenv, and pyenv
# 10-100x faster than pip
pip install uv
uv pip install requests
uv venv .venv
uv pip compile requirements.in

Trivia: pip didn't exist until 2008. Before that, packages were installed with easy_install, which couldn't even uninstall packages. The name "pip" is recursive: "pip installs packages."


Part 27: Testing

# test_health.py
import pytest

def test_parse_health_response():
    response = {"status": "healthy", "uptime": 84600}
    assert response["status"] == "healthy"
    assert response["uptime"] > 0

def test_parse_unhealthy_response():
    response = {"status": "degraded", "errors": ["disk_full"]}
    assert response["status"] != "healthy"
    assert len(response["errors"]) > 0

def test_missing_key_uses_default():
    response = {}
    status = response.get("status", "unknown")
    assert status == "unknown"
# Run tests
pip install pytest
pytest test_health.py -v

Part 28: Footguns — Mistakes That Turn Automation Into Liability

1. Hardcoding AWS Credentials

Your aws_access_key_id in a Python file gets committed to Git. Someone runs trufflehog. AWS sends you a bill for 200 GPU instances mining crypto.

Fix: Use environment variables, AWS profiles, or IAM roles. boto3 checks ~/.aws/credentials and instance metadata automatically.

2. No Timeout on HTTP Requests

requests.get(url) with no timeout blocks forever. Your cron job piles up. 30 zombie Python processes consuming memory.

Fix: Always timeout=(5, 30) — 5s to connect, 30s to read.

3. subprocess with shell=True and User Input

Shell injection vulnerability. A hostname containing ; rm -rf / gets executed.

Fix: Pass arguments as a list: subprocess.run(["ping", "-c", "1", hostname]).

4. Not Paginating AWS API Calls

Script works in dev (10 instances), returns wrong results in prod (2,000 instances) — only first 1,000 visible.

Fix: Use paginators for every AWS list operation.

5. Non-Atomic File Writes

Process killed mid-write → half-written config → service crashes.

Fix: Write to temp file, then rename. rename() on same filesystem is atomic on Linux.

6. Catching Exception Instead of Specific Exceptions

except Exception: pass silently swallows NameErrors, ConnectionErrors, everything. Script reports success when it did nothing.

Fix: Catch specific exceptions. Let unexpected ones crash loudly.

7. Sequential Fleet Operations

SSHing into 500 servers one at a time. Each takes 2-3 seconds. Script takes 25 minutes.

Fix: ThreadPoolExecutor(max_workers=20). Same operation in under a minute.

8. Loading Entire Large Files into Memory

f.readlines() on a 5 GB file. Python uses ~15 GB (object overhead). OOM killer terminates your app.

Fix: Stream line by line: for line in open(path).

9. Using os.system Instead of subprocess

os.system("systemctl restart nginx") — can't capture stdout, can't get exit code reliably, can't handle arguments safely.

Fix: subprocess.run() with capture_output=True and check=True.

10. No Logging in Automation Scripts

Script runs via cron, fails. Nobody knows because it only printed to stdout and nobody reads root's email. Failing for 2 weeks.

Fix: Use logging module. Log to stderr with timestamps and severity levels.

11. yaml.load() Instead of yaml.safe_load()

Security vulnerability — can execute arbitrary Python code from YAML. Common audit finding.

Fix: Always yaml.safe_load().


Glossary

Term Definition
Python Interpreted, dynamically-typed language. Named after Monty Python, not the snake
REPL Read-Eval-Print Loop — the >>> interactive prompt
f-string Formatted string literal: f"Hello {name}" (Python 3.6+)
list Ordered, mutable collection: [1, 2, 3]
dict Key-value mapping: {"host": "web-01", "port": 80}
tuple Ordered, immutable collection: (1, 2, 3)
Counter Dict subclass for counting: Counter(words).most_common(10)
defaultdict Dict with auto-initialized missing keys
list comprehension Inline list creation: [x for x in items if condition]
generator Lazy iterator using yield — processes one item at a time
exception Error object for control flow: try/except
context manager with statement — guarantees cleanup (file close, lock release)
decorator Function wrapper: @retry(max_attempts=3)
venv Virtual environment — isolated per-project dependencies
pip Package installer: pip install requests
module Importable Python file
package Directory of modules (has __init__.py)
GIL Global Interpreter Lock — one thread runs Python bytecode at a time. Released during I/O
breakpoint() Built-in to enter the debugger (Python 3.7+)
pdb Python's built-in interactive debugger
pathlib Object-oriented file path handling: Path("/etc") / "nginx"
boto3 AWS SDK for Python
requests HTTP library — the better curl
PyYAML YAML parser. Always use safe_load()
Jinja2 Templating engine: {{ variable }}, {% for %}
Click Decorator-based CLI framework
argparse Standard library CLI argument parser
subprocess Run shell commands from Python. Never shell=True with user input
idempotent Safe to run multiple times without changing result
timeout Upper bound on waiting — prevents hangs. Always set one

Trivia and History

  1. Named after comedy, not a snake. Guido van Rossum named Python after Monty Python's Flying Circus. The docs use "spam," "eggs," and "ham" as variable names (from the Monty Python sketch) instead of "foo" and "bar."

  2. Christmas 1989 hobby project. Guido started Python during Christmas week 1989 as a successor to the ABC language. First public release (0.9.0) came in February 1991.

  3. The Benevolent Dictator. Guido held the title "Benevolent Dictator for Life" (BDFL) until he resigned in July 2018 after the contentious PEP 572 (walrus operator :=) debate. Python is now governed by a five-person Steering Council.

  4. The Zen of Python. Type import this in a Python interpreter to see 19 aphorisms by Tim Peters (PEP 20, 1999). The 20th was intentionally left blank.

  5. Indentation by design. Python's significant whitespace was deliberate, inspired by ABC and Donald Knuth's literate programming. Guido argued that since programmers indent anyway, the language should enforce it.

  6. The GIL controversy. The Global Interpreter Lock (added in 1992) prevents true multi-threaded parallelism. PEP 703 (accepted 2023) began the multi-year project to make it optional (expected ~Python 3.15+).

  7. The 12-year migration. Python 3.0 was released December 2008. Python 2.7 was sunset January 1, 2020 — a 12-year transition that became a cautionary tale about breaking backward compatibility.

  8. Python was infra before it was web. Guido created Python as a system administration scripting language in 1991. Web frameworks (Django 2005, Flask 2010) came much later. Python's first major use was file management and system scripting.

  9. Ansible's secret weapon. Ansible chose Python because it's installed by default on virtually every Linux distribution. Modules execute using the system Python — no agent needed. This "agentless" architecture was only possible because of Python's ubiquity.

  10. subprocess replaced five modules. Python's subprocess (2004) unified os.system, os.spawn*, os.popen*, popen2.*, and commands.*. Despite this, os.system() still appears in code written in 2025.

  11. The GIL doesn't matter for infra. Infrastructure scripts are I/O-bound (waiting for SSH, APIs, files). I/O-bound code benefits from threading even with the GIL.

  12. pip didn't exist until 2008. Before pip, easy_install couldn't even uninstall packages. "pip" = "pip installs packages" (recursive acronym).

  13. Click powers most modern CLI tools. Click (2014) replaced argparse as the go-to for Python CLIs. AWS CLI v2, Datasette, and hundreds of DevOps tools use it.

  14. Python replaced Perl. Perl's TIMTOWTDI ("There's More Than One Way To Do It") lost to Python's "There should be one obvious way." Readability won.

  15. uv is rewriting Python tooling in Rust. uv (2024, by Astral/Ruff creators) is 10-100x faster than pip. Replaces pip, pip-tools, virtualenv, and pyenv.


Flashcard Review

Basics

Q A
What is Python (one line)? Interpreted, dynamically-typed language. Named after Monty Python, runs everywhere
How do you run a Python script? python3 script.py or ./script.py with shebang #!/usr/bin/env python3
What are Python's basic types? str, int, float, bool, None
What is an f-string? Formatted string: f"Hello {name}" — any expression inside {}
if __name__ == "__main__": — what does it do? Runs code only when file is executed directly, not when imported

Data Structures

Q A
list vs tuple? List is mutable [1,2,3], tuple is immutable (1,2,3)
What is dict.get(key, default)? Returns default instead of raising KeyError on missing key
What does Counter.most_common(10) return? List of (key, count) tuples, sorted by count descending
What does defaultdict(list) do? Auto-creates empty list for missing keys — no "if key not in dict" needed
What is a list comprehension? Inline filter/transform: [x for x in items if condition]

Operations

Q A
try/except vs Bash set -e? try/except catches specific errors with recovery. set -e just exits
with open(file) as f: — why with? Guarantees file is closed even on exceptions (context manager)
subprocess.run() — why pass a list not a string? Avoids shell injection. shell=True with user input is a security hole
Why always set timeout= on requests? Without it, the call blocks forever if the server doesn't respond
yaml.safe_load() vs yaml.load()? safe_load prevents code execution from YAML. load is a security vulnerability

Infrastructure

Q A
How does boto3 find credentials? Explicit params → env vars → ~/.aws/credentials → instance metadata
Why must you paginate AWS API calls? APIs return max 100-1000 results. Without pagination, you miss the rest
requests.get() vs curl? requests gives you sessions, retries, JSON parsing, proper error types
When do you use threading vs multiprocessing? Threading for I/O-bound (HTTP, SSH). Multiprocessing for CPU-bound (math)
What does ThreadPoolExecutor(max_workers=20) do? Runs up to 20 tasks in parallel using a thread pool

Debugging and Packaging

Q A
How do you enter the Python debugger? breakpoint() in code, or python3 -m pdb script.py
What is a virtual environment? Isolated Python installation with its own packages. Created with python3 -m venv .venv
What does pip freeze do? Dumps all installed packages with exact versions
What is pip-tools? Separates what you want (requirements.in) from what you get (requirements.txt)

Drills

Drill 1: Parse JSON API Response (Easy)

Q: Write a Python one-liner to fetch a URL and pretty-print the JSON response.

Answer
# stdlib only (no pip install needed):
python3 -c "import json,urllib.request; print(json.dumps(json.loads(urllib.request.urlopen('http://localhost:8080/health').read()),indent=2))"

# With requests:
import requests
print(requests.get('http://localhost:8080/health', timeout=10).json())

Drill 2: Read and Filter YAML (Easy)

Q: Read a Kubernetes YAML file and print all container image names.

Answer
import yaml

with open('deployment.yaml') as f:
    doc = yaml.safe_load(f)

for c in doc['spec']['template']['spec']['containers']:
    print(f"{c['name']}: {c['image']}")

Drill 3: subprocess Safely (Easy)

Q: Run kubectl get pods -o json from Python and list pod names with their status.

Answer
import subprocess, json

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-o', 'json'],
    capture_output=True, text=True, check=True,
)
for pod in json.loads(result.stdout)['items']:
    print(f"{pod['metadata']['name']}: {pod['status']['phase']}")
Key: list args (not string), `capture_output=True`, `check=True`, `text=True`.

Drill 4: pathlib File Processing (Easy)

Q: Find all .yaml files in a directory tree and count total lines.

Answer
from pathlib import Path

total = 0
for f in Path('.').rglob('*.yaml'):
    lines = len(f.read_text().splitlines())
    total += lines
    print(f"{f}: {lines} lines")
print(f"\nTotal: {total} lines")

Drill 5: Environment Variables with Validation (Easy)

Q: Read config from environment variables with defaults and validation.

Answer
import os, sys

def require_env(name):
    val = os.environ.get(name)
    if not val:
        print(f"ERROR: {name} required", file=sys.stderr)
        sys.exit(1)
    return val

config = {
    'db_host': os.environ.get('DB_HOST', 'localhost'),
    'db_port': int(os.environ.get('DB_PORT', '5432')),
    'db_name': require_env('DB_NAME'),
    'debug': os.environ.get('DEBUG', 'false').lower() == 'true',
}

Drill 6: HTTP Health Check (Medium)

Q: Check health endpoints for multiple services and exit non-zero if any fail.

Answer
import urllib.request, sys

SERVICES = {
    'api': 'http://localhost:8080/health',
    'frontend': 'http://localhost:3000/health',
}

failures = []
for name, url in SERVICES.items():
    try:
        req = urllib.request.urlopen(url, timeout=5)
        status = "OK" if req.getcode() == 200 else "FAIL"
    except Exception:
        status = "FAIL"
        failures.append(name)
    print(f"  {name}: {status}")

sys.exit(1 if failures else 0)

Drill 7: Log Parsing with Counter (Medium)

Q: Parse nginx access logs and report top 10 IPs.

Answer
from collections import Counter

ip_counts = Counter()
with open('/var/log/nginx/access.log') as f:
    for line in f:
        ip_counts[line.split()[0]] += 1

for ip, count in ip_counts.most_common(10):
    print(f"  {ip}: {count}")

Drill 8: Jinja2 Templating (Medium)

Q: Generate Kubernetes manifests from a template for multiple services.

Answer
from jinja2 import Template

TMPL = Template("""apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ name }}
spec:
  replicas: {{ replicas }}
  template:
    spec:
      containers:
      - name: {{ name }}
        image: {{ image }}:{{ tag }}
""")

for svc in [
    {'name': 'api', 'replicas': 3, 'image': 'myapp/api', 'tag': 'v2.1'},
    {'name': 'worker', 'replicas': 2, 'image': 'myapp/worker', 'tag': 'v2.1'},
]:
    print("---")
    print(TMPL.render(**svc))

Drill 9: Retry Decorator (Medium)

Q: Write a retry decorator with exponential backoff.

Answer
import time, functools

def retry(max_attempts=3, base_delay=1, backoff_factor=2):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts:
                        raise
                    delay = base_delay * (backoff_factor ** (attempt - 1))
                    print(f"Attempt {attempt} failed: {e}. Retrying in {delay}s...")
                    time.sleep(delay)
        return wrapper
    return decorator

@retry(max_attempts=3, base_delay=2)
def call_api():
    return requests.get(url, timeout=10).json()

Drill 10: Kubernetes CrashLoopBackOff Detection (Hard)

Q: Use the Python Kubernetes client to find all CrashLoopBackOff pods.

Answer
from kubernetes import client, config

try:
    config.load_incluster_config()
except config.ConfigException:
    config.load_kube_config()

v1 = client.CoreV1Api()
for pod in v1.list_pod_for_all_namespaces().items:
    for cs in (pod.status.container_statuses or []):
        waiting = cs.state.waiting
        if waiting and waiting.reason == 'CrashLoopBackOff':
            print(f"  {pod.metadata.namespace}/{pod.metadata.name} "
                  f"({cs.name}) - {cs.restart_count} restarts")

Drill 11: Translate a Bash Pipeline (Medium)

Q: Translate to Python: cat /etc/passwd | grep -v '^#' | awk -F: '$7 !~ /nologin|false/ {print $1, $7}' | sort

Answer
results = []
with open("/etc/passwd") as f:
    for line in f:
        line = line.strip()
        if line.startswith("#") or not line:
            continue
        parts = line.split(":")
        if len(parts) >= 7:
            user, shell = parts[0], parts[6]
            if "nologin" not in shell and "false" not in shell:
                results.append((user, shell))

for user, shell in sorted(results):
    print(f"{user} {shell}")

Cheat Sheet

Bash → Python Rosetta Stone

Bash Python Notes
$var var No prefix, no quoting
echo "$var" print(f"{var}") f-strings
${#string} len(string) Works on lists, dicts, strings
$((x + 1)) x + 1 Math is native
[[ $a == $b ]] a == b No brackets
[ -f file ] Path(file).is_file() from pathlib import Path
declare -A mydict = {} First-class data structure
for x in ...; do for x in ...: Colon, not semicolon-do
while read line for line in f: File iteration
func() { ... } def func(): ... Indentation, not braces
$1, $2 Named params def f(host, port):
$(cmd) subprocess.run(...) Prefer native Python
cmd \| grep \| awk for/if/split Data stays in-process
set -e try/except Per-operation, specific
exit 1 sys.exit(1) Or raise exception
source file.sh import module Namespaced
sort \| uniq -c Counter() from collections import Counter
curl requests Sessions, retries, JSON
jq json module Native data structures
find -name '*.log' Path.rglob('*.log') Returns Path objects
mktemp + trap EXIT with tempfile: Cleanup guaranteed

Quick Commands

# Pretty-print JSON
python3 -m json.tool < file.json

# HTTP server
python3 -m http.server 8000

# Check syntax
python3 -m py_compile script.py

# Profile performance
python3 -m cProfile -s cumtime script.py | head -30

# Create venv
python3 -m venv .venv && source .venv/bin/activate

# Generate password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Self-Assessment

Core Language

  • I can write and run a Python script with a shebang
  • I understand types (str, int, float, bool, None) and explicit conversion
  • I can use f-strings for formatted output
  • I can use lists, dicts, tuples, and Counter
  • I understand list comprehensions
  • I can write functions with named arguments and defaults
  • I can use with open() for file I/O
  • I can use try/except for specific error handling
  • I know when Bash is still the right tool vs when to switch to Python

Infrastructure Libraries

  • I can use subprocess.run() safely (list args, check=True, no shell=True)
  • I can use pathlib for file operations instead of os.path
  • I can parse JSON and YAML (with safe_load)
  • I can use requests with sessions, retries, and timeouts
  • I can write a CLI tool with argparse or Click

Cloud and Automation

  • I can use boto3 with paginators and error handling
  • I can run parallel operations with ThreadPoolExecutor
  • I can use the Kubernetes Python client
  • I can generate config files with Jinja2
  • I understand the GIL and when threading vs multiprocessing applies

Production Readiness

  • I use logging instead of print for production scripts
  • I can set up virtual environments and pin dependencies
  • I can use breakpoint() and pdb for debugging
  • I know the major footguns (no timeout, shell=True, yaml.load, no pagination)
  • I can write basic pytest tests for my scripts