Python The Complete Guide

lesson
python-basics
data-structures
file-i/o
error-handling
subprocess
pathlib
requests
boto3
paramiko
json/yaml
regex
argparse
logging
collections
concurrency
debugging
packaging
virtual-environments
jinja2
click
kubernetes-client
testing ---# Python — The Complete Guide: From Zero to Infrastructure Automation

Topics: Python basics, data structures, file I/O, error handling, subprocess, pathlib, requests, boto3, paramiko, JSON/YAML, regex, argparse, logging, collections, concurrency, debugging, packaging, virtual environments, Jinja2, Click, Kubernetes client, testing Strategy: Build-up from absolute zero (Bash comparison throughout) with war stories, trivia, and drills woven in Level: L0–L2 (Zero → Foundations → Operations) Time: 4–5 hours (designed for deep study in one or multiple sittings) Prerequisites: Familiarity with the Linux command line and Bash scripting. No prior Python experience required.

The Mission¶

You're a terminal-native engineer. You've been writing Bash for years — maybe decades. Your scripts work. They glue infrastructure together with pipes, awk, and sed. But your latest 500-line Bash monitoring script needs JSON parsing, retry logic with exponential backoff, parallel execution across 50 hosts, Slack webhook integration, and proper error handling. You stare at the tangled quoting and subshell variable scoping and realize: this is where Bash stops being the right tool.

By the end of this guide you'll be able to write Python scripts that replace your Bash toolkit, automate cloud infrastructure with real APIs, process data at scale, debug production issues, and build CLI tools your team will actually use. This is the one document you need to go from "I only know Bash" to "I can automate anything in Python."

Table of Contents¶

Why Python? The Decision Line
Running the Thing — REPL, Scripts, Shebangs
Variables and Types — Not Everything Is a String
f-Strings — printf That Doesn't Hate You
Data Structures — Lists, Dicts, and Counter
Conditionals, Loops, and Truthiness
Functions — No More Subshell Surprises
File I/O — open(), with, and No More Redirects
Error Handling — Why Exceptions Beat Exit Codes
String Methods — Your sed/awk/cut Replacement
The Import System and Standard Library
subprocess — The Escape Hatch to Shell
pathlib — Files Without the Pain
JSON and YAML — The Languages Infrastructure Speaks
requests — The Better curl
Regular Expressions — grep/sed in Python
Logging — Structured Output for Production
CLI Tools — argparse and Click
boto3 — Automating AWS
paramiko — SSH from Python
Jinja2 — Templating Config Files
Concurrency — Threading, Multiprocessing, asyncio
Data Wrangling — Log Parsing at Scale
Kubernetes Client — Automating K8s
Debugging — From print() to pdb to Production
Virtual Environments and Packaging
Testing — pytest for Infrastructure Scripts
Footguns — Mistakes That Turn Automation Into Liability
Glossary
Trivia and History
Flashcard Review
Drills
Cheat Sheet
Self-Assessment

Part 1: Why Python? The Decision Line¶

                BASH TERRITORY          |    PYTHON TERRITORY
                                        |
  One-liner file ops                    |    JSON/YAML/XML parsing
  Gluing 3–4 commands together          |    API calls with auth + retries
  Simple cron jobs                      |    Data structures beyond arrays
  Config file generation (heredocs)     |    Error handling with recovery
  Quick log tailing / grepping          |    Parallel execution
  Package install / service restart     |    Anything over ~100 lines
  Git hooks                             |    CSV/database operations
  Environment setup scripts             |    Unit tests / reusable libraries

The 100-Line Rule¶

If your Bash script passes 100 lines, ask: "Is this still glue, or is this logic?" Glue connects programs. Logic transforms data, makes decisions, handles errors. Bash is great glue. Bash is terrible logic.

Three Signals It's Time to Switch¶

You're building data structures. declare -A and naming conventions like host_1_ip, host_1_port? Python's dicts and classes will save you hours.
You're parsing structured data. If you're piping jq through awk back into jq, you're writing a bad Python script in Bash.
You need error recovery, not just detection. set -e exits on failure. try/except catches specific errors, retries, falls back, logs context, and continues.

Mental Model: Bash is a text stream processor. Everything is a string. Every tool communicates via text piped between processes. Python is a data structure processor. You parse text into objects once, then work with real types — lists, dicts, integers, booleans. The moment your Bash script starts doing math on strings or building data structures with associative arrays, you've crossed the line into Python territory.

Etymology: Python was created by Guido van Rossum in 1991 and named after Monty Python's Flying Circus, not the snake. The language's design philosophy is captured in "The Zen of Python" (import this), which includes "Readability counts" and "There should be one — and preferably only one — obvious way to do it."

Part 2: Running the Thing¶

The REPL — Your New Scratch Terminal¶

# You already do this all day:
$ echo "hello"
hello

# Python has the same thing:
$ python3
>>> print("hello")
hello
>>> 2 + 2
4
>>> exit()

The >>> prompt is Python's interactive shell — the REPL (Read-Eval-Print Loop). It's your bash -c equivalent for testing one-liners.

Scripts and Shebangs¶

#!/usr/bin/env python3
print("I am a python script")

Same pattern as Bash: chmod +x script.py, run with ./script.py. The env trick finds whichever python3 is in your $PATH.

Gotcha: Outside a virtual environment, always use python3 — the bare python command is not uniform across Unix systems (per PEP 394, it may point to Python 2, Python 3, or not exist at all). Inside an activated virtual environment, python is fine and usually preferred — the venv guarantees it points to the correct interpreter. Python 2 reached end of life on January 1, 2020.

Quick One-Liners from the Shell¶

# Version check
python3 -c "import sys; print(sys.version)"

# Pretty-print JSON (no jq needed)
python3 -m json.tool < file.json

# Instant HTTP file server
python3 -m http.server 8000

# Generate a random password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

# Base64 encode
python3 -c "import base64; print(base64.b64encode(b'secret').decode())"

# Check if a module is installed
python3 -c "import boto3; print(boto3.__version__)"

Part 3: Variables and Types¶

In Bash, everything is a string. In Python, data has types:

name = "webserver"       # str (string)
count = 42               # int (integer)
uptime = 99.7            # float (decimal)
is_running = True        # bool (boolean — capital T)
last_error = None        # NoneType (like null — "nothing here")

No $ prefix. No quoting disasters. No declare -i. The value itself tells Python what type it is.

Type Conversion — Explicit Is Better¶

port = "8080"          # String from a config file
port_int = int(port)   # Now it's an integer
print(port_int + 1)    # 8081

int("not_a_number")    # ValueError — Python yells immediately

Python yells at you instead of silently doing the wrong thing. This is a feature.

Mental Model: Bash is text-in, text-out. You convert between types by piping strings through commands (bc, awk, printf). In Python, data carries its type with it, and you convert explicitly with int(), str(), float().

Part 4: f-Strings¶

String formatting in Bash is a minefield of quoting rules. Python f-strings are the fix:

host = "web-03"
port = 8080
print(f"Connecting to {host}:{port}")                     # Variables
print(f"Status: {port + 1}")                              # Expressions
print(f"{'HOST':<20} {'PORT':>5}")                        # Alignment
print(f"Uptime: {99.734:.1f}%")                           # Decimal places
print(f"Size: {1048576:,} bytes")                         # Thousands separator

Anything inside {} in an f-string is a Python expression. Variables, math, function calls — all valid. No more escaping nested quotes inside $() inside double quotes inside heredocs.

History: f-strings were introduced in Python 3.6 (2016) via PEP 498. The older formats — % formatting (from C's printf) and .format() — still work but are more verbose. f-strings won because they put the value right next to where it appears.

Part 5: Data Structures¶

Lists — Arrays That Actually Work¶

servers = ["web-01", "web-02", "db-01"]
print(len(servers))               # 3
print(servers[0])                 # "web-01"
print(servers[-1])                # "db-01" (negative indexing!)
servers.append("cache-01")        # Append

# Filter (list comprehension)
high_ports = [p for p in [80, 443, 8080, 9090] if p > 1024]  # [8080, 9090]

# Check membership
if 443 in [80, 443, 8080]:
    print("HTTPS is configured")

List comprehensions ([x for x in items if condition]) are Python's pipeline filters — they read left to right, same as cmd | grep condition.

Dicts — The Data Structure That Replaces Half Your Scripts¶

counts = {
    "sshd": 47,
    "cron": 12,
    "kernel": 8,
}

print(counts["sshd"])              # 47
print(counts.get("nginx", 0))     # 0 (default if missing — no crash)

for program, count in counts.items():
    print(f"{program}: {count}")

# Sort by value
for prog, n in sorted(counts.items(), key=lambda x: x[1], reverse=True):
    print(f"{prog:<15} {n:>5}")

Key insight: dict.get(key, default) returns the default instead of crashing on a missing key. This eliminates the most common dict-related bug.

Counter — The awk Killer¶

from collections import Counter

# The Bash way:
# awk '{print $5}' /var/log/syslog | cut -d'[' -f1 | sort | uniq -c | sort -rn

# The Python way:
counts = Counter()
with open("/var/log/syslog") as f:
    for line in f:
        parts = line.split()
        if len(parts) >= 5:
            program = parts[4].split("[")[0].rstrip(":")
            counts[program] += 1

for program, n in counts.most_common(20):
    print(f"{program:<20} {n:>6}")

Five Bash commands piped together → one Python data structure. And counts is an object you can query, filter, serialize to JSON, or combine with another Counter. The Bash pipeline gave you printed text. Python gave you data.

War Story: A team had a 120-line Bash script monitoring log volume across 15 services. It used nested for loops, three declare -A arrays, and had 9 bugs related to uninitialized array keys (Bash returns empty strings for missing keys, which silently breaks arithmetic). The Python rewrite used {service: Counter()} and was 40 lines. The three hardest bugs in the Bash version were impossible in Python because Counter() initializes missing keys to zero automatically.

defaultdict — Auto-Initializing Dicts¶

from collections import defaultdict

# Group log lines by status code
lines_by_status = defaultdict(list)
with open('access.log') as f:
    for line in f:
        status = line.split()[8]
        lines_by_status[status].append(line.rstrip())

# No "if key not in dict" boilerplate needed
print(f"Unique 500 errors: {len(lines_by_status['500'])}")

Sets — Fast Membership Tests¶

seen = {"web-01", "web-02"}
if "web-01" in seen:
    print("duplicate")
# Sets are O(1) lookup — use for membership tests and deduplication

# Dedup a list while preserving order (Python 3.7+)
hosts = ["web-01", "db-01", "web-01", "cache-01"]
unique = list(dict.fromkeys(hosts))  # ["web-01", "db-01", "cache-01"]

Tuples — Immutable Lists¶

# Tuples can't be changed after creation (immutable)
point = (10, 20)
host_port = ("web-01", 8080)

# Tuple unpacking — use instead of $1, $2
host, port = host_port
print(f"{host}:{port}")

# Multiple return values from functions
total, errors, warnings = analyze_log("/var/log/syslog")

Part 6: Control Flow¶

Conditionals — No More Bracket Roulette¶

if status == "running":
    print("Service is up")
elif count > 10:
    print("Above threshold")
else:
    print("Something else")

No then. No fi. No semicolons. No brackets. Indentation defines the block.

Use is None for None checks, not == None. is tests identity, == tests equality. None is a singleton, so is is both faster and semantically correct:

result = get_server_status()
if result is None:       # Correct
    print("No response")
if result is not None:   # Correct negation
    print(f"Got: {result}")

Truthiness¶

# These are "falsy" (treated as False in if statements)
False, 0, 0.0, "", [], {}, None

# Everything else is "truthy"
# So you can write:
if servers:             # True if list is non-empty
    print("We have servers")
if name:                # True if string is non-empty
    print(f"Hello, {name}")

Loops¶

# Iterate a list
for server in ["web-01", "web-02", "db-01"]:
    print(f"Checking {server}")

# Range
for i in range(10):      # 0 through 9
    print(i)

# enumerate — index + value (no manual counter)
for i, server in enumerate(servers):
    print(f"{i}: {server}")

# zip — walk two lists in lockstep
for host, port in zip(["web-01", "db-01"], [80, 5432]):
    print(f"{host}:{port}")

enumerate() and zip() are the two loop tools you'll use most.

File Reading¶

with open("/var/log/syslog") as f:
    for line in f:
        print(line, end="")     # end="" because line already has \n

The with keyword is a context manager — it automatically closes the file when you're done, even if an error occurs. It's Python's version of trap cleanup EXIT, except scoped to one block.

Part 7: Functions¶

Bash functions return exit codes (0–255) and communicate via stdout. Python functions return actual values:

def get_disk_usage(path="/"):
    import shutil
    total, used, free = shutil.disk_usage(path)
    return round(used / total * 100, 1)    # Returns a float, not a string

usage = get_disk_usage()          # No subshell. No text parsing.
print(f"Usage: {usage}%")

# Named arguments with defaults
def check_port(host, port, timeout=3):
    import socket
    try:
        sock = socket.create_connection((host, port), timeout=timeout)
        sock.close()
        return True
    except (ConnectionRefusedError, TimeoutError):
        return False

# Multiple return values
def analyze_log(path):
    errors = warnings = total = 0
    with open(path) as f:
        for line in f:
            total += 1
            if "ERROR" in line: errors += 1
            elif "WARN" in line: warnings += 1
    return total, errors, warnings

total, errors, warnings = analyze_log("/var/log/syslog")

Feature	Bash	Python
Return data	`echo` + capture with `$()`	`return value`
Return type	Always a string (stdout)	Any type: int, str, list, dict, bool
Arguments	Positional: `$1`, `$2`	Named, with defaults
Scope	Global by default (need `local`)	Local by default
Multiple returns	Impossible	`return a, b, c` (tuple)

Under the Hood: In Bash, result=$(my_function) runs the function in a subshell. Variable changes inside $() disappear when the subshell exits. This is the #1 source of "why didn't my variable update?" bugs. Python functions share the same process and return data directly.

Type Hints and Dataclasses¶

# Type hints improve readability and tooling
def classify_load(value: float) -> str:
    if value >= 10:
        return "critical"
    if value >= 5:
        return "warning"
    return "ok"

# Dataclasses for structured data
from dataclasses import dataclass, field

@dataclass
class Host:
    name: str
    address: str
    port: int = 22
    tags: list[str] = field(default_factory=list)

Use type hints on function boundaries first — that gets most of the value. Use dataclasses when the data shape matters (named fields, sane defaults, fewer typo bugs).

Mutable Default Arguments¶

# BAD: mutable default argument (shared across all calls!)
def add_host(name, tags=[]):
    tags.append(name)
    return tags

# GOOD: use None and create fresh
def add_host(name: str, tags: list[str] | None = None) -> list[str]:
    tags = [] if tags is None else tags
    tags.append(name)
    return tags

The default [] is created once at function definition time and shared across every call. Call add_host("a") then add_host("b") and the second call returns ["a", "b"]. This is one of Python's most common footguns.

Part 8: File I/O¶

# Read entire file
with open("/etc/hostname") as f:
    content = f.read().strip()

# Process line by line (memory-efficient for huge files)
with open("/etc/passwd") as f:
    for line in f:
        parts = line.strip().split(":")
        print(f"{parts[0]:<20} {parts[-1]}")

# Write to a file
with open("/tmp/config.txt", "w") as f:
    f.write("server=web-03\n")
    f.write("port=8080\n")

Mode	Meaning	Bash Equivalent
`"r"`	Read (default)	`< file`
`"w"`	Write (truncates!)	`> file`
`"a"`	Append	`>> file`

Gotcha: "w" mode truncates the file immediately on open — before you write anything. For atomic writes, write to a temp file then rename it (same pattern as safe config updates in Bash).

Atomic File Writes¶

import tempfile
from pathlib import Path

def atomic_write(path, content):
    """Write content to file atomically — safe for config files."""
    path = Path(path)
    fd, tmp_path = tempfile.mkstemp(dir=path.parent, suffix='.tmp')
    try:
        Path(tmp_path).write_text(content)
        Path(tmp_path).rename(path)      # Atomic on same filesystem
    except Exception:
        Path(tmp_path).unlink(missing_ok=True)
        raise

Part 9: Error Handling¶

In Bash, error handling is set -e and prayer. Python has try/except — structured, specific, and reliable:

try:
    with open("/var/log/syslog") as f:
        content = f.read()
except FileNotFoundError:
    print("Syslog not found (are you on macOS?)")
    content = ""
except PermissionError:
    print("Permission denied — run with sudo?")
    content = ""

# Catch, log, and continue (what set -e can't do)
results = {}
for server in ["web-01", "web-02", "web-03"]:
    try:
        results[server] = check_server(server)
    except ConnectionError as e:
        print(f"WARN: {server} unreachable: {e}")
        results[server] = None
        # Script continues instead of dying

The Exception Hierarchy You Need¶

BaseException
 └── Exception
      ├── FileNotFoundError     # File doesn't exist
      ├── PermissionError       # Can't read/write
      ├── ValueError            # Wrong value (int("abc"))
      ├── KeyError              # Dict key doesn't exist
      ├── IndexError            # List index out of range
      ├── TypeError             # Wrong type (len(42))
      ├── ConnectionError       # Network failure
      ├── TimeoutError          # Operation timed out
      └── KeyboardInterrupt     # Ctrl+C

try / except / else / finally¶

try:
    f = open("/var/log/syslog")
    data = f.read()
except FileNotFoundError:
    print("File missing")
    data = ""
else:
    # Only runs if NO exception occurred
    print(f"Read {len(data)} bytes")
finally:
    # ALWAYS runs — cleanup goes here
    print("Done")

block/rescue Equivalent¶

# Ansible-style block/rescue pattern
try:
    deploy_application()
    verify_health()
except Exception:
    rollback_application()
    notify_team("Deploy failed, rolled back")
finally:
    log_deployment_attempt()

Mental Model: set -e is a fire alarm — when something goes wrong, everybody evacuates. try/except is a fire extinguisher — you identify what's burning, put it out, and keep working.

Part 10: String Methods¶

Python strings have methods that replace most sed/awk/cut one-liners:

line = "  Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password  "

line.strip()                          # Remove leading/trailing whitespace
line.split()                          # Split on whitespace (like awk default)
line.split(":")                       # Split on colons (like cut -d':')
", ".join(["web-01", "web-02"])       # "web-01, web-02"
"sshd[28410]:".startswith("sshd")     # True
"Hello World".replace("World", "Ops") # "Hello Ops"
"Failed password" in line             # True (substring check)
"warning".upper()                     # "WARNING"

# Chaining — extract program name from syslog line
# Bash: echo "$line" | awk '{print $5}' | cut -d'[' -f1 | tr -d ':'
program = line.split()[4].split("[")[0].rstrip(":")
# "sshd"

One line, no pipes, no subprocesses, and it returns a Python string you can use directly in a dict, comparison, or f-string.

Fact: Python strings are immutable — every method returns a new string. The original is unchanged. This means you can never accidentally corrupt data by modifying it in two places at once.

Part 11: Imports and the Standard Library¶

# Standard library — ships with Python, no install required
import os                    # OS interactions (env vars, paths, PIDs)
import sys                   # Python runtime (args, exit, stdin/stdout)
import json                  # JSON parsing/writing
import re                    # Regular expressions
import subprocess            # Run shell commands
import datetime              # Dates and times
import collections           # Counter, defaultdict
import socket                # Low-level networking
import shutil                # File copying, disk usage
import csv                   # CSV file reading/writing
import hashlib               # Hashing (md5, sha256)
import argparse              # CLI argument parsing
import logging               # Structured logging
import tempfile              # Temporary files
from pathlib import Path     # File path operations

# Third-party — install with pip
import requests              # HTTP requests (better curl)
import yaml                  # YAML parsing
import boto3                 # AWS SDK

"Batteries included" — Python's standard library ships with modules for nearly everything. The phrase was coined in the late 1990s. On a server where you can't install packages, you still have json, csv, re, pathlib, subprocess, logging, and more.

The Main Guard¶

#!/usr/bin/env python3

def main():
    print("This only runs when executed directly")

if __name__ == "__main__":
    main()

When Python runs a file directly, __name__ is "__main__". When imported as a module, it's the module name. This lets your file work as both a script and a reusable library.

Part 12: subprocess — The Escape Hatch¶

import subprocess

# Simple command
result = subprocess.run(
    ["df", "-h", "/"],
    capture_output=True,
    text=True,
    check=True,          # Raise on non-zero exit
)
print(result.stdout)

# Parse JSON output from a command
result = subprocess.run(
    ["kubectl", "get", "pods", "-o", "json"],
    capture_output=True, text=True, check=True,
)
pods = json.loads(result.stdout)

# With timeout
result = subprocess.run(
    ["helm", "list"],
    capture_output=True, text=True, timeout=30,
)

# Stream output in real time
process = subprocess.Popen(
    ["ansible-playbook", "site.yml"],
    stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
)
for line in process.stdout:
    print(line, end='')

The shell=True Footgun¶

# NEVER DO THIS with user input:
subprocess.run(f"ping -c 1 {hostname}", shell=True)  # Shell injection!
# What if hostname is "google.com; rm -rf /"?

# SAFE: pass arguments as a list
subprocess.run(["ping", "-c", "1", hostname])

When to Use subprocess vs Native Python¶

Use subprocess	Use native Python
`systemctl restart nginx`	Parsing a file (`open()`)
`iptables -L`	HTTP requests (`requests`)
`docker ps`	JSON parsing (`json`)
`git log --oneline`	File operations (`pathlib`)
`aws` CLI (quick one-offs)	String matching (`re`)

Gotcha: The most common mistake when Bash experts start writing Python is calling subprocess.run() for everything. If you're writing subprocess.run(["grep", "ERROR", logfile]), you're paying Python's overhead without getting its benefit. Use for line in open(logfile) if "ERROR" in line.

Part 13: pathlib — Files Without the Pain¶

from pathlib import Path

# Path manipulation (no more os.path.join())
config = Path("/etc/myapp") / "conf.d" / "upstream.yaml"
backup = config.with_suffix(".yaml.bak")

# Properties
config.parent     # PosixPath('/etc/myapp/conf.d')
config.name       # 'upstream.yaml'
config.stem       # 'upstream'
config.suffix     # '.yaml'

# Check existence
config.exists()
config.is_file()
config.is_dir()

# Read/write
content = config.read_text()
config.write_text("new content")

# Create directories
Path("/backup/myapp").mkdir(parents=True, exist_ok=True)

# Find files (like find command)
for log in Path("/var/log").glob("*.log"):
    size_mb = log.stat().st_size / (1024 * 1024)
    if size_mb > 100:
        print(f"Large log: {log} ({size_mb:.1f} MB)")

# Recursive glob
for yaml_file in Path("/etc").rglob("*.yaml"):
    print(yaml_file)

Trivia: The / operator for paths was added in Python 3.4 (2014). It works by overriding __truediv__, the same method that handles a / b for numbers.

Part 14: JSON and YAML¶

JSON¶

import json

# Parse from string
data = json.loads('{"status": "healthy", "uptime": 84600}')

# Parse from file
with open("response.json") as f:
    data = json.load(f)

# Write (pretty-printed)
print(json.dumps(data, indent=2))

# Navigate nested structures
pod_name = data["metadata"]["name"]
node = data["status"].get("hostIP", "unknown")  # Safe with default

YAML¶

import yaml  # pip install pyyaml

# Read a Kubernetes manifest
with open("deployment.yaml") as f:
    manifest = yaml.safe_load(f)

# Multi-document YAML (multiple --- separated docs)
with open("all-resources.yaml") as f:
    for doc in yaml.safe_load_all(f):
        if doc:
            print(f"{doc.get('kind')}: {doc['metadata']['name']}")

Security: Always use yaml.safe_load(), never yaml.load(). The unsafe version can execute arbitrary Python code embedded in YAML. This is a known attack vector, not a theoretical risk. If you see yaml.load(f) without a Loader argument, that's a security bug.

YAML's Type Surprises¶

gotchas = yaml.safe_load("""
norway_code: NO       # boolean False (YAML 1.1!)
version: 1.10         # float 1.1 (trailing zero dropped!)
port: 8080            # integer (fine)
""")
# "NO" becomes False, "1.10" becomes 1.1
# Always quote strings that could be misinterpreted

The Norway Problem: Country code NO being parsed as boolean False has caused real deployment failures. YAML 1.2 fixed this, but PyYAML still implements YAML 1.1.

TOML¶

Python 3.11+ includes tomllib in the standard library for reading TOML files (used by pyproject.toml):

import tomllib

with open("pyproject.toml", "rb") as f:
    config = tomllib.load(f)
print(config["project"]["name"])

Note: tomllib is read-only. If you need to write TOML, use the third-party tomli-w package.

Part 15: requests — The Better curl¶

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# Basic GET
response = requests.get("http://api.internal:8080/health", timeout=10)
print(response.status_code)   # 200
print(response.json())        # Parsed JSON as a dict

# POST with JSON body
response = requests.post(
    "https://api.example.com/deploy",
    json={"version": "v2.0", "env": "prod"},
    headers={"Authorization": "Bearer mytoken"},
    timeout=10,
)
response.raise_for_status()   # Raises HTTPError for 4xx/5xx

Sessions with Retries — The Non-Negotiable Pattern¶

def get_session(retries=3, backoff_factor=0.5, timeout=10):
    """Create a requests session with automatic retries."""
    session = requests.Session()
    retry = Retry(
        total=retries,
        backoff_factor=backoff_factor,     # 0.5s, 1s, 2s between retries
        status_forcelist=[500, 502, 503, 504],
        allowed_methods=["GET", "HEAD"],
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session

session = get_session()
data = session.get("http://prometheus.internal:9090/api/v1/targets", timeout=10).json()

War Story: A monitoring script checked 40 endpoints every 60 seconds with no retry logic. During a 3-second 502 from a routine deploy, it fired 40 "service down" alerts to Slack and paged the on-call at 3 AM. After adding retries with 2-second backoff, false alerts dropped 90%.

Gotcha: requests.get(url) with no timeout blocks forever if the server doesn't respond. Your cron job piles up. You now have 30 zombie Python processes. Always set timeout=.

Part 16: Regular Expressions¶

import re

# Simple match (like grep)
if re.search(r"Failed password", line):
    print("SSH failure detected")

# Extract groups (like sed capture groups)
match = re.search(r"from (\d+\.\d+\.\d+\.\d+) port (\d+)", line)
if match:
    ip = match.group(1)
    port = match.group(2)

# Find all matches
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", log_content)

# Replace (like sed 's/old/new/g')
cleaned = re.sub(r"\s+", " ", messy_text)

# Compile for performance (reuse the pattern)
LOG_PATTERN = re.compile(r"^(\S+) - - \[(.+?)\] \"(\S+) (\S+)")
for line in open("access.log"):
    match = LOG_PATTERN.match(line)
    if match:
        ip, timestamp, method, path = match.groups()

Part 17: Logging¶

import logging
import sys

def setup_logging(verbose=False):
    level = logging.DEBUG if verbose else logging.INFO
    logging.basicConfig(
        level=level,
        format='%(asctime)s %(levelname)s %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
        handlers=[logging.StreamHandler(sys.stderr)],
    )
    return logging.getLogger(__name__)

log = setup_logging()
log.info("Starting backup for %d hosts", len(hosts))
log.warning("Host %s unreachable: %s", host, error)
log.error("Backup failed: %s", str(e))

JSON Logging (for Monitoring Pipelines)¶

import json
from datetime import datetime

def log_json(event, **kwargs):
    entry = {'event': event, 'timestamp': datetime.now(datetime.UTC).isoformat()}
    entry.update(kwargs)
    print(json.dumps(entry), file=sys.stderr)

log_json("backup_complete", hosts=5, duration_seconds=142)

Note: datetime.utcnow() is deprecated in Python 3.12+. Use datetime.now(datetime.UTC) instead — it returns a timezone-aware datetime, which prevents a whole class of "naive vs aware" comparison bugs.

Part 18: CLI Tools¶

argparse (Standard Library)¶

#!/usr/bin/env python3
"""Morning infrastructure health check."""
import argparse
import os

def main():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('--env', required=True, choices=['dev', 'staging', 'prod'])
    parser.add_argument('--dry-run', action='store_true')
    parser.add_argument('--verbose', '-v', action='count', default=0)
    parser.add_argument('--timeout', type=int,
                        default=int(os.environ.get('TIMEOUT', '30')))
    args = parser.parse_args()
    # args.env, args.dry_run, args.verbose, args.timeout

if __name__ == '__main__':
    main()

Click (Third-Party, More Powerful)¶

import click

@click.group()
@click.option('--verbose', '-v', is_flag=True)
@click.pass_context
def cli(ctx, verbose):
    """Infrastructure management tool."""
    ctx.ensure_object(dict)
    ctx.obj['verbose'] = verbose

@cli.command()
@click.argument('environment', type=click.Choice(['dev', 'staging', 'prod']))
@click.option('--region', '-r', default='us-east-1')
def list_servers(environment, region):
    """List servers in an environment."""
    servers = get_instances_by_tag('Environment', environment)
    for s in servers:
        click.echo(f"{s['id']:<22} {s['ip']:<16} {s['type']}")

@cli.command()
@click.argument('instance_id')
@click.confirmation_option(prompt='Stop this instance?')
def stop(instance_id):
    """Stop an EC2 instance."""
    stop_instance(instance_id)

Configuration Precedence Pattern¶

CLI flags  →  override  →  Environment variables  →  override  →  Config file  →  Defaults
(highest)                                                                        (lowest)

This matches how every serious CLI tool (kubectl, aws, terraform) works.

Part 19: boto3 — Automating AWS¶

import boto3
from botocore.exceptions import ClientError

ec2 = boto3.client('ec2', region_name='us-east-1')
s3 = boto3.client('s3')

# List instances by tag
def get_instances_by_tag(tag_key, tag_value):
    response = ec2.describe_instances(
        Filters=[
            {'Name': f'tag:{tag_key}', 'Values': [tag_value]},
            {'Name': 'instance-state-name', 'Values': ['running']},
        ]
    )
    instances = []
    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            instances.append({
                'id': instance['InstanceId'],
                'ip': instance.get('PrivateIpAddress'),
                'type': instance['InstanceType'],
            })
    return instances

# CRITICAL: Paginate all list operations
def list_all_s3_objects(bucket, prefix=''):
    paginator = s3.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        for obj in page.get('Contents', []):
            yield obj['Key'], obj['Size']

# Error handling
try:
    ec2.stop_instances(InstanceIds=[instance_id])
except ClientError as e:
    if e.response['Error']['Code'] == 'InvalidInstanceID.NotFound':
        print(f"Instance {instance_id} not found")
    else:
        raise

Gotcha: boto3 reads credentials in this order: (1) explicit parameters, (2) environment variables, (3) ~/.aws/credentials, (4) EC2 instance metadata / ECS task role. Never hardcode credentials in code.

Gotcha: AWS APIs return at most 100-1000 results per call. If you have 5,000 instances and don't paginate, you only see the first 1,000. Use paginators for every describe_*, list_*, get_* call.

Trivia: boto3 is the most-used AWS SDK in any language, with over 1 billion downloads per month from PyPI.

Part 20: paramiko — SSH from Python¶

import paramiko

def run_remote_command(host, user, key_path, command):
    """Run a command on a remote host via SSH."""
    client = paramiko.SSHClient()
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    try:
        client.connect(hostname=host, username=user,
                       key_filename=key_path, timeout=10)
        stdin, stdout, stderr = client.exec_command(command, timeout=30)
        exit_code = stdout.channel.recv_exit_status()
        return {
            'host': host,
            'stdout': stdout.read().decode().strip(),
            'stderr': stderr.read().decode().strip(),
            'exit_code': exit_code,
        }
    finally:
        client.close()   # Always close, even on exception

Gotcha: If you don't close paramiko connections on exceptions, after 200 hosts you hit the file descriptor limit. Always use try/finally.

Security: In production, avoid AutoAddPolicy() — it accepts any host key without verification, making you vulnerable to MITM attacks. Use RejectPolicy() or WarningPolicy() and manage known hosts properly (e.g., client.load_system_host_keys()).

Part 21: Jinja2 — Templating Config Files¶

from jinja2 import Template, Environment, FileSystemLoader

# Inline template
tmpl = Template("Hello {{ name }}")
print(tmpl.render(name="world"))

# From files
env = Environment(loader=FileSystemLoader('templates/'),
                  trim_blocks=True, lstrip_blocks=True)
tmpl = env.get_template('nginx.conf.j2')
config = tmpl.render(
    service_name='myapp',
    backends=[
        {'ip': '10.0.1.10', 'port': 8080, 'weight': 100},
        {'ip': '10.0.1.11', 'port': 8080, 'weight': 100},
    ],
    domain='app.example.com',
)

# K8s manifest generation
K8S_TEMPLATE = Template("""
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ name }}
spec:
  replicas: {{ replicas }}
  template:
    spec:
      containers:
      - name: {{ name }}
        image: {{ image }}:{{ tag }}
""")

for svc in services:
    print("---")
    print(K8S_TEMPLATE.render(**svc))

Part 22: Concurrency¶

The GIL (Global Interpreter Lock)¶

The GIL allows only one thread to execute Python bytecode at a time. But it's released during I/O (network, disk, sleep). So:

Workload	GIL Impact	Right Tool
I/O-bound (HTTP, SSH, file I/O)	Minimal	`threading` or `asyncio`
CPU-bound (math, parsing)	Severe — threads give zero speedup	`multiprocessing`

Key fact for DevOps: Infrastructure scripts are almost always I/O-bound. The GIL does not matter for your work. Threading works great.

ThreadPoolExecutor — Parallel Fleet Operations¶

from concurrent.futures import ThreadPoolExecutor, as_completed

def check_host_health(host):
    try:
        resp = requests.get(f'http://{host}:8080/health', timeout=5)
        return {'host': host, 'healthy': resp.ok}
    except requests.exceptions.RequestException as e:
        return {'host': host, 'healthy': False, 'error': str(e)}

def parallel_health_check(hosts, max_workers=20):
    results = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_host = {
            executor.submit(check_host_health, host): host
            for host in hosts
        }
        for future in as_completed(future_to_host):
            results.append(future.result())
    return results

# 200 hosts checked in parallel — seconds instead of minutes
results = parallel_health_check(all_hosts, max_workers=30)
unhealthy = [r for r in results if not r['healthy']]

Fleet Operation Pattern¶

def fleet_operation(hosts, operation, max_workers=20, fail_fast=False):
    """Run an operation across a fleet of hosts in parallel."""
    results = {'success': [], 'failed': []}
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(operation, host): host for host in hosts}
        for future in as_completed(futures):
            host = futures[future]
            try:
                result = future.result()
                results['success'].append({'host': host, 'result': result})
            except Exception as e:
                results['failed'].append({'host': host, 'error': str(e)})
                if fail_fast:
                    executor.shutdown(wait=False, cancel_futures=True)
                    break
    return results

Part 23: Data Wrangling — Log Parsing at Scale¶

Reading Large Files Without Killing the Server¶

# BAD: loads 800 MB into RAM (plus ~3x object overhead)
lines = open('access.log').readlines()

# GOOD: streams line by line — constant memory
with open('access.log') as f:
    for line in f:
        process(line)

# Gzipped files
import gzip
with gzip.open('access.log.gz', 'rt', errors='replace') as f:
    for line in f:
        process(line)

Multi-Aggregation in One Pass¶

from collections import Counter
from pathlib import Path
import gzip

ip_counts = Counter()
status_counts = Counter()
endpoint_counts = Counter()

for log_file in sorted(Path('/var/log/nginx').glob('access.log.*.gz')):
    with gzip.open(log_file, 'rt', errors='replace') as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 10:
                ip_counts[parts[0]] += 1
                status_counts[parts[8]] += 1
                endpoint_counts[parts[6]] += 1

# Three aggregations, one pass, constant memory
print("Top IPs:", ip_counts.most_common(10))
print("Status codes:", status_counts.most_common())

Generators for Memory-Efficient Pipelines¶

def grep_file(filepath, pattern):
    """Memory-efficient file search — like grep but returns structured data."""
    with open(filepath) as f:
        for line_num, line in enumerate(f, 1):
            if pattern in line:
                yield line_num, line.rstrip()

# Lazy evaluation — only processes what's needed
for num, line in grep_file("/var/log/syslog", "ERROR"):
    print(f"{num}: {line}")

Part 24: Kubernetes Client¶

from kubernetes import client, config  # pip install kubernetes

# Load kubeconfig
try:
    config.load_incluster_config()       # Inside a pod
except config.ConfigException:
    config.load_kube_config()            # From ~/.kube/config

v1 = client.CoreV1Api()

# List pods
pods = v1.list_namespaced_pod('default')
for p in pods.items:
    print(f"{p.metadata.name}: {p.status.phase}")

# Find CrashLoopBackOff pods across all namespaces
all_pods = v1.list_pod_for_all_namespaces()
for pod in all_pods.items:
    for cs in (pod.status.container_statuses or []):
        waiting = cs.state.waiting
        if waiting and waiting.reason == 'CrashLoopBackOff':
            print(f"  {pod.metadata.namespace}/{pod.metadata.name} "
                  f"({cs.name}) - {cs.restart_count} restarts")

Part 25: Debugging¶

print() — The Universal First Step¶

print(f"DEBUG: variable = {variable!r}")  # !r shows type info

breakpoint() — The Built-In Debugger¶

def process_data(records):
    for record in records:
        breakpoint()  # Drops into pdb interactive debugger
        transform(record)

Essential pdb Commands¶

Command	Short	What It Does
`next`	`n`	Execute next line (step over)
`step`	`s`	Step into function call
`continue`	`c`	Continue until next breakpoint
`print expr`	`p expr`	Print expression value
`list`	`l`	Show source code around current line
`where`	`w`	Show call stack
`quit`	`q`	Quit debugger

# Disable all breakpoints via environment
PYTHONBREAKPOINT=0 python3 script.py

# Use ipdb instead of pdb (better UI)
PYTHONBREAKPOINT=ipdb.set_trace python3 script.py

Remote Debugging (Docker, Production)¶

import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger...")
debugpy.wait_for_client()

Profiling¶

# Find slow functions
python3 -m cProfile -s cumtime myscript.py 2>&1 | head -30

# Check syntax without running
python3 -m py_compile myscript.py

Part 26: Virtual Environments and Packaging¶

Virtual Environments¶

# Create
python3 -m venv .venv

# Activate
source .venv/bin/activate

# Install packages (goes into .venv only)
pip install requests boto3

# Freeze dependencies
pip freeze > requirements.txt

# Deactivate
deactivate

A venv is just a directory. Delete it and you're clean. Never commit .venv/ to git.

requirements.txt¶

# Direct dependencies (loose — for libraries)
requests>=2.28
flask>=3.0

# Pinned (reproducible — for applications)
requests==2.31.0
flask==3.0.2
Werkzeug==3.0.1

pip-tools — The Better Way¶

pip install pip-tools

# requirements.in — what you WANT
cat requirements.in
# requests>=2.28
# flask>=3.0

# Compile to pinned requirements.txt — what you GET
pip-compile requirements.in

# Install exactly what's pinned
pip-sync requirements.txt

pyproject.toml (Modern Standard)¶

[project]
name = "my-infra-tool"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
    "requests>=2.28",
    "boto3>=1.28",
    "click>=8.0",
]

[project.scripts]
infra-check = "my_tool.cli:main"

uv — The Future¶

# uv is a Rust-based replacement for pip, pip-tools, virtualenv, and pyenv
# 10-100x faster than pip
pip install uv
uv pip install requests
uv venv .venv
uv pip compile requirements.in

Trivia: pip didn't exist until 2008. Before that, packages were installed with easy_install, which couldn't even uninstall packages. The name "pip" is recursive: "pip installs packages."

Part 27: Testing¶

# test_health.py
import pytest

def test_parse_health_response():
    response = {"status": "healthy", "uptime": 84600}
    assert response["status"] == "healthy"
    assert response["uptime"] > 0

def test_parse_unhealthy_response():
    response = {"status": "degraded", "errors": ["disk_full"]}
    assert response["status"] != "healthy"
    assert len(response["errors"]) > 0

def test_missing_key_uses_default():
    response = {}
    status = response.get("status", "unknown")
    assert status == "unknown"

# Run tests
pip install pytest
pytest test_health.py -v

Part 28: Footguns — Mistakes That Turn Automation Into Liability¶

1. Hardcoding AWS Credentials¶

Your aws_access_key_id in a Python file gets committed to Git. Someone runs trufflehog. AWS sends you a bill for 200 GPU instances mining crypto.

Fix: Use environment variables, AWS profiles, or IAM roles. boto3 checks ~/.aws/credentials and instance metadata automatically.

2. No Timeout on HTTP Requests¶

requests.get(url) with no timeout blocks forever. Your cron job piles up. 30 zombie Python processes consuming memory.

Fix: Always timeout=(5, 30) — 5s to connect, 30s to read.

3. subprocess with shell=True and User Input¶

Shell injection vulnerability. A hostname containing ; rm -rf / gets executed.

Fix: Pass arguments as a list: subprocess.run(["ping", "-c", "1", hostname]).

4. Not Paginating AWS API Calls¶

Script works in dev (10 instances), returns wrong results in prod (2,000 instances) — only first 1,000 visible.

Fix: Use paginators for every AWS list operation.

5. Non-Atomic File Writes¶

Process killed mid-write → half-written config → service crashes.

Fix: Write to temp file, then rename. rename() on same filesystem is atomic on Linux.

6. Catching Exception Instead of Specific Exceptions¶

except Exception: pass silently swallows NameErrors, ConnectionErrors, everything. Script reports success when it did nothing.

Fix: Catch specific exceptions. Let unexpected ones crash loudly.

7. Sequential Fleet Operations¶

SSHing into 500 servers one at a time. Each takes 2-3 seconds. Script takes 25 minutes.

Fix: ThreadPoolExecutor(max_workers=20). Same operation in under a minute.

8. Loading Entire Large Files into Memory¶

f.readlines() on a 5 GB file. Python uses ~15 GB (object overhead). OOM killer terminates your app.

Fix: Stream line by line: for line in open(path).

9. Using os.system Instead of subprocess¶

os.system("systemctl restart nginx") — can't capture stdout, can't get exit code reliably, can't handle arguments safely.

Fix: subprocess.run() with capture_output=True and check=True.

10. No Logging in Automation Scripts¶

Script runs via cron, fails. Nobody knows because it only printed to stdout and nobody reads root's email. Failing for 2 weeks.

Fix: Use logging module. Log to stderr with timestamps and severity levels.

11. yaml.load() Instead of yaml.safe_load()¶

Security vulnerability — can execute arbitrary Python code from YAML. Common audit finding.

Fix: Always yaml.safe_load().

Glossary¶

Term	Definition
Python	Interpreted, dynamically-typed language. Named after Monty Python, not the snake
REPL	Read-Eval-Print Loop — the `>>>` interactive prompt
f-string	Formatted string literal: `f"Hello {name}"` (Python 3.6+)
list	Ordered, mutable collection: `[1, 2, 3]`
dict	Key-value mapping: `{"host": "web-01", "port": 80}`
tuple	Ordered, immutable collection: `(1, 2, 3)`
Counter	Dict subclass for counting: `Counter(words).most_common(10)`
defaultdict	Dict with auto-initialized missing keys
list comprehension	Inline list creation: `[x for x in items if condition]`
generator	Lazy iterator using `yield` — processes one item at a time
exception	Error object for control flow: `try/except`
context manager	`with` statement — guarantees cleanup (file close, lock release)
decorator	Function wrapper: `@retry(max_attempts=3)`
venv	Virtual environment — isolated per-project dependencies
pip	Package installer: `pip install requests`
module	Importable Python file
package	Directory of modules (has `__init__.py`)
GIL	Global Interpreter Lock — one thread runs Python bytecode at a time. Released during I/O
breakpoint()	Built-in to enter the debugger (Python 3.7+)
pdb	Python's built-in interactive debugger
pathlib	Object-oriented file path handling: `Path("/etc") / "nginx"`
boto3	AWS SDK for Python
requests	HTTP library — the better curl
PyYAML	YAML parser. Always use `safe_load()`
Jinja2	Templating engine: `{{ variable }}`, `{% for %}`
Click	Decorator-based CLI framework
argparse	Standard library CLI argument parser
subprocess	Run shell commands from Python. Never `shell=True` with user input
idempotent	Safe to run multiple times without changing result
timeout	Upper bound on waiting — prevents hangs. Always set one

Trivia and History¶

Named after comedy, not a snake. Guido van Rossum named Python after Monty Python's Flying Circus. The docs use "spam," "eggs," and "ham" as variable names (from the Monty Python sketch) instead of "foo" and "bar."
Christmas 1989 hobby project. Guido started Python during Christmas week 1989 as a successor to the ABC language. First public release (0.9.0) came in February 1991.
The Benevolent Dictator. Guido held the title "Benevolent Dictator for Life" (BDFL) until he resigned in July 2018 after the contentious PEP 572 (walrus operator :=) debate. Python is now governed by a five-person Steering Council.
The Zen of Python. Type import this in a Python interpreter to see 19 aphorisms by Tim Peters (PEP 20, 1999). The 20th was intentionally left blank.
Indentation by design. Python's significant whitespace was deliberate, inspired by ABC and Donald Knuth's literate programming. Guido argued that since programmers indent anyway, the language should enforce it.
The GIL controversy. The Global Interpreter Lock (added in 1992) prevents true multi-threaded parallelism. PEP 703 (accepted 2023) began the multi-year project to make it optional (expected ~Python 3.15+).
The 12-year migration. Python 3.0 was released December 2008. Python 2.7 was sunset January 1, 2020 — a 12-year transition that became a cautionary tale about breaking backward compatibility.
Python was infra before it was web. Guido created Python as a system administration scripting language in 1991. Web frameworks (Django 2005, Flask 2010) came much later. Python's first major use was file management and system scripting.
Ansible's secret weapon. Ansible chose Python because it's installed by default on virtually every Linux distribution. Modules execute using the system Python — no agent needed. This "agentless" architecture was only possible because of Python's ubiquity.
subprocess replaced five modules. Python's subprocess (2004) unified os.system, os.spawn*, os.popen*, popen2.*, and commands.*. Despite this, os.system() still appears in code written in 2025.
The GIL doesn't matter for infra. Infrastructure scripts are I/O-bound (waiting for SSH, APIs, files). I/O-bound code benefits from threading even with the GIL.
pip didn't exist until 2008. Before pip, easy_install couldn't even uninstall packages. "pip" = "pip installs packages" (recursive acronym).
Click powers most modern CLI tools. Click (2014) replaced argparse as the go-to for Python CLIs. AWS CLI v2, Datasette, and hundreds of DevOps tools use it.
Python replaced Perl. Perl's TIMTOWTDI ("There's More Than One Way To Do It") lost to Python's "There should be one obvious way." Readability won.
uv is rewriting Python tooling in Rust. uv (2024, by Astral/Ruff creators) is 10-100x faster than pip. Replaces pip, pip-tools, virtualenv, and pyenv.

Flashcard Review¶

Basics¶

Q	A
What is Python (one line)?	Interpreted, dynamically-typed language. Named after Monty Python, runs everywhere
How do you run a Python script?	`python3 script.py` or `./script.py` with shebang `#!/usr/bin/env python3`
What are Python's basic types?	`str`, `int`, `float`, `bool`, `None`
What is an f-string?	Formatted string: `f"Hello {name}"` — any expression inside `{}`
`if __name__ == "__main__":` — what does it do?	Runs code only when file is executed directly, not when imported

Data Structures¶

Q	A
list vs tuple?	List is mutable `[1,2,3]`, tuple is immutable `(1,2,3)`
What is `dict.get(key, default)`?	Returns default instead of raising KeyError on missing key
What does `Counter.most_common(10)` return?	List of `(key, count)` tuples, sorted by count descending
What does `defaultdict(list)` do?	Auto-creates empty list for missing keys — no "if key not in dict" needed
What is a list comprehension?	Inline filter/transform: `[x for x in items if condition]`

Operations¶

Q	A
`try/except` vs Bash `set -e`?	`try/except` catches specific errors with recovery. `set -e` just exits
`with open(file) as f:` — why `with`?	Guarantees file is closed even on exceptions (context manager)
`subprocess.run()` — why pass a list not a string?	Avoids shell injection. `shell=True` with user input is a security hole
Why always set `timeout=` on requests?	Without it, the call blocks forever if the server doesn't respond
`yaml.safe_load()` vs `yaml.load()`?	`safe_load` prevents code execution from YAML. `load` is a security vulnerability

Infrastructure¶

Q	A
How does boto3 find credentials?	Explicit params → env vars → `~/.aws/credentials` → instance metadata
Why must you paginate AWS API calls?	APIs return max 100-1000 results. Without pagination, you miss the rest
`requests.get()` vs `curl`?	requests gives you sessions, retries, JSON parsing, proper error types
When do you use threading vs multiprocessing?	Threading for I/O-bound (HTTP, SSH). Multiprocessing for CPU-bound (math)
What does `ThreadPoolExecutor(max_workers=20)` do?	Runs up to 20 tasks in parallel using a thread pool

Debugging and Packaging¶

Q	A
How do you enter the Python debugger?	`breakpoint()` in code, or `python3 -m pdb script.py`
What is a virtual environment?	Isolated Python installation with its own packages. Created with `python3 -m venv .venv`
What does `pip freeze` do?	Dumps all installed packages with exact versions
What is pip-tools?	Separates what you want (`requirements.in`) from what you get (`requirements.txt`)

Drills¶

Drill 1: Parse JSON API Response (Easy)¶

Q: Write a Python one-liner to fetch a URL and pretty-print the JSON response.

Answer

# stdlib only (no pip install needed):
python3 -c "import json,urllib.request; print(json.dumps(json.loads(urllib.request.urlopen('http://localhost:8080/health').read()),indent=2))"

# With requests:
import requests
print(requests.get('http://localhost:8080/health', timeout=10).json())

Drill 2: Read and Filter YAML (Easy)¶

Q: Read a Kubernetes YAML file and print all container image names.

Answer

import yaml

with open('deployment.yaml') as f:
    doc = yaml.safe_load(f)

for c in doc['spec']['template']['spec']['containers']:
    print(f"{c['name']}: {c['image']}")

Drill 3: subprocess Safely (Easy)¶

Q: Run kubectl get pods -o json from Python and list pod names with their status.

Answer

import subprocess, json

result = subprocess.run(
    ['kubectl', 'get', 'pods', '-o', 'json'],
    capture_output=True, text=True, check=True,
)
for pod in json.loads(result.stdout)['items']:
    print(f"{pod['metadata']['name']}: {pod['status']['phase']}")

Key: list args (not string), `capture_output=True`, `check=True`, `text=True`.

Drill 4: pathlib File Processing (Easy)¶

Q: Find all .yaml files in a directory tree and count total lines.

Answer

from pathlib import Path

total = 0
for f in Path('.').rglob('*.yaml'):
    lines = len(f.read_text().splitlines())
    total += lines
    print(f"{f}: {lines} lines")
print(f"\nTotal: {total} lines")

Drill 5: Environment Variables with Validation (Easy)¶

Q: Read config from environment variables with defaults and validation.

Answer

import os, sys

def require_env(name):
    val = os.environ.get(name)
    if not val:
        print(f"ERROR: {name} required", file=sys.stderr)
        sys.exit(1)
    return val

config = {
    'db_host': os.environ.get('DB_HOST', 'localhost'),
    'db_port': int(os.environ.get('DB_PORT', '5432')),
    'db_name': require_env('DB_NAME'),
    'debug': os.environ.get('DEBUG', 'false').lower() == 'true',
}

Drill 6: HTTP Health Check (Medium)¶

Q: Check health endpoints for multiple services and exit non-zero if any fail.

Answer

import urllib.request, sys

SERVICES = {
    'api': 'http://localhost:8080/health',
    'frontend': 'http://localhost:3000/health',
}

failures = []
for name, url in SERVICES.items():
    try:
        req = urllib.request.urlopen(url, timeout=5)
        status = "OK" if req.getcode() == 200 else "FAIL"
    except Exception:
        status = "FAIL"
        failures.append(name)
    print(f"  {name}: {status}")

sys.exit(1 if failures else 0)

Drill 7: Log Parsing with Counter (Medium)¶

Q: Parse nginx access logs and report top 10 IPs.

Answer

from collections import Counter

ip_counts = Counter()
with open('/var/log/nginx/access.log') as f:
    for line in f:
        ip_counts[line.split()[0]] += 1

for ip, count in ip_counts.most_common(10):
    print(f"  {ip}: {count}")

Drill 8: Jinja2 Templating (Medium)¶

Q: Generate Kubernetes manifests from a template for multiple services.

Answer

from jinja2 import Template

TMPL = Template("""apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ name }}
spec:
  replicas: {{ replicas }}
  template:
    spec:
      containers:
      - name: {{ name }}
        image: {{ image }}:{{ tag }}
""")

for svc in [
    {'name': 'api', 'replicas': 3, 'image': 'myapp/api', 'tag': 'v2.1'},
    {'name': 'worker', 'replicas': 2, 'image': 'myapp/worker', 'tag': 'v2.1'},
]:
    print("---")
    print(TMPL.render(**svc))

Drill 9: Retry Decorator (Medium)¶

Q: Write a retry decorator with exponential backoff.

Answer

import time, functools

def retry(max_attempts=3, base_delay=1, backoff_factor=2):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts:
                        raise
                    delay = base_delay * (backoff_factor ** (attempt - 1))
                    print(f"Attempt {attempt} failed: {e}. Retrying in {delay}s...")
                    time.sleep(delay)
        return wrapper
    return decorator

@retry(max_attempts=3, base_delay=2)
def call_api():
    return requests.get(url, timeout=10).json()

Drill 10: Kubernetes CrashLoopBackOff Detection (Hard)¶

Q: Use the Python Kubernetes client to find all CrashLoopBackOff pods.

Answer

from kubernetes import client, config

try:
    config.load_incluster_config()
except config.ConfigException:
    config.load_kube_config()

v1 = client.CoreV1Api()
for pod in v1.list_pod_for_all_namespaces().items:
    for cs in (pod.status.container_statuses or []):
        waiting = cs.state.waiting
        if waiting and waiting.reason == 'CrashLoopBackOff':
            print(f"  {pod.metadata.namespace}/{pod.metadata.name} "
                  f"({cs.name}) - {cs.restart_count} restarts")

Drill 11: Translate a Bash Pipeline (Medium)¶

Q: Translate to Python: cat /etc/passwd | grep -v '^#' | awk -F: '$7 !~ /nologin|false/ {print $1, $7}' | sort

Answer

results = []
with open("/etc/passwd") as f:
    for line in f:
        line = line.strip()
        if line.startswith("#") or not line:
            continue
        parts = line.split(":")
        if len(parts) >= 7:
            user, shell = parts[0], parts[6]
            if "nologin" not in shell and "false" not in shell:
                results.append((user, shell))

for user, shell in sorted(results):
    print(f"{user} {shell}")

Cheat Sheet¶

Bash → Python Rosetta Stone¶

Bash	Python	Notes
`$var`	`var`	No prefix, no quoting
`echo "$var"`	`print(f"{var}")`	f-strings
`${#string}`	`len(string)`	Works on lists, dicts, strings
`$((x + 1))`	`x + 1`	Math is native
`[[ $a == $b ]]`	`a == b`	No brackets
`[ -f file ]`	`Path(file).is_file()`	`from pathlib import Path`
`declare -A`	`mydict = {}`	First-class data structure
`for x in ...; do`	`for x in ...:`	Colon, not semicolon-do
`while read line`	`for line in f:`	File iteration
`func() { ... }`	`def func(): ...`	Indentation, not braces
`$1`, `$2`	Named params	`def f(host, port):`
`$(cmd)`	`subprocess.run(...)`	Prefer native Python
`cmd \\| grep \\| awk`	`for/if/split`	Data stays in-process
`set -e`	`try/except`	Per-operation, specific
`exit 1`	`sys.exit(1)`	Or raise exception
`source file.sh`	`import module`	Namespaced
`sort \\| uniq -c`	`Counter()`	`from collections import Counter`
`curl`	`requests`	Sessions, retries, JSON
`jq`	`json` module	Native data structures
`find -name '*.log'`	`Path.rglob('*.log')`	Returns Path objects
`mktemp + trap EXIT`	`with tempfile:`	Cleanup guaranteed

Quick Commands¶

# Pretty-print JSON
python3 -m json.tool < file.json

# HTTP server
python3 -m http.server 8000

# Check syntax
python3 -m py_compile script.py

# Profile performance
python3 -m cProfile -s cumtime script.py | head -30

# Create venv
python3 -m venv .venv && source .venv/bin/activate

# Generate password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

Self-Assessment¶

Core Language¶

I can write and run a Python script with a shebang
I understand types (str, int, float, bool, None) and explicit conversion
I can use f-strings for formatted output
I can use lists, dicts, tuples, and Counter
I understand list comprehensions
I can write functions with named arguments and defaults
I can use with open() for file I/O
I can use try/except for specific error handling
I know when Bash is still the right tool vs when to switch to Python

Infrastructure Libraries¶

I can use subprocess.run() safely (list args, check=True, no shell=True)
I can use pathlib for file operations instead of os.path
I can parse JSON and YAML (with safe_load)
I can use requests with sessions, retries, and timeouts
I can write a CLI tool with argparse or Click

Cloud and Automation¶

I can use boto3 with paginators and error handling
I can run parallel operations with ThreadPoolExecutor
I can use the Kubernetes Python client
I can generate config files with Jinja2
I understand the GIL and when threading vs multiprocessing applies

Production Readiness¶

I use logging instead of print for production scripts
I can set up virtual environments and pin dependencies
I can use breakpoint() and pdb for debugging
I know the major footguns (no timeout, shell=True, yaml.load, no pagination)
I can write basic pytest tests for my scripts

Python — Zero to Script for the Terminal Native — Ground-zero lesson with syslog parsing mission
Python for Ops — The Bash Expert's Bridge — Deep Bash-to-Python translation guide
Python Data Wrangling for Ops — Log parsing and data transformation at scale
Python — Automating Everything: APIs and Infrastructure — Building a morning check tool with K8s, Prometheus, AWS, and Slack