Python The Complete Guide
- lesson
- python-basics
- data-structures
- file-i/o
- error-handling
- subprocess
- pathlib
- requests
- boto3
- paramiko
- json/yaml
- regex
- argparse
- logging
- collections
- concurrency
- debugging
- packaging
- virtual-environments
- jinja2
- click
- kubernetes-client
- testing ---# Python — The Complete Guide: From Zero to Infrastructure Automation
Topics: Python basics, data structures, file I/O, error handling, subprocess, pathlib, requests, boto3, paramiko, JSON/YAML, regex, argparse, logging, collections, concurrency, debugging, packaging, virtual environments, Jinja2, Click, Kubernetes client, testing Strategy: Build-up from absolute zero (Bash comparison throughout) with war stories, trivia, and drills woven in Level: L0–L2 (Zero → Foundations → Operations) Time: 4–5 hours (designed for deep study in one or multiple sittings) Prerequisites: Familiarity with the Linux command line and Bash scripting. No prior Python experience required.
The Mission¶
You're a terminal-native engineer. You've been writing Bash for years — maybe decades. Your scripts work. They glue infrastructure together with pipes, awk, and sed. But your latest 500-line Bash monitoring script needs JSON parsing, retry logic with exponential backoff, parallel execution across 50 hosts, Slack webhook integration, and proper error handling. You stare at the tangled quoting and subshell variable scoping and realize: this is where Bash stops being the right tool.
By the end of this guide you'll be able to write Python scripts that replace your Bash toolkit, automate cloud infrastructure with real APIs, process data at scale, debug production issues, and build CLI tools your team will actually use. This is the one document you need to go from "I only know Bash" to "I can automate anything in Python."
Table of Contents¶
- Why Python? The Decision Line
- Running the Thing — REPL, Scripts, Shebangs
- Variables and Types — Not Everything Is a String
- f-Strings — printf That Doesn't Hate You
- Data Structures — Lists, Dicts, and Counter
- Conditionals, Loops, and Truthiness
- Functions — No More Subshell Surprises
- File I/O — open(), with, and No More Redirects
- Error Handling — Why Exceptions Beat Exit Codes
- String Methods — Your sed/awk/cut Replacement
- The Import System and Standard Library
- subprocess — The Escape Hatch to Shell
- pathlib — Files Without the Pain
- JSON and YAML — The Languages Infrastructure Speaks
- requests — The Better curl
- Regular Expressions — grep/sed in Python
- Logging — Structured Output for Production
- CLI Tools — argparse and Click
- boto3 — Automating AWS
- paramiko — SSH from Python
- Jinja2 — Templating Config Files
- Concurrency — Threading, Multiprocessing, asyncio
- Data Wrangling — Log Parsing at Scale
- Kubernetes Client — Automating K8s
- Debugging — From print() to pdb to Production
- Virtual Environments and Packaging
- Testing — pytest for Infrastructure Scripts
- Footguns — Mistakes That Turn Automation Into Liability
- Glossary
- Trivia and History
- Flashcard Review
- Drills
- Cheat Sheet
- Self-Assessment
Part 1: Why Python? The Decision Line¶
BASH TERRITORY | PYTHON TERRITORY
|
One-liner file ops | JSON/YAML/XML parsing
Gluing 3–4 commands together | API calls with auth + retries
Simple cron jobs | Data structures beyond arrays
Config file generation (heredocs) | Error handling with recovery
Quick log tailing / grepping | Parallel execution
Package install / service restart | Anything over ~100 lines
Git hooks | CSV/database operations
Environment setup scripts | Unit tests / reusable libraries
The 100-Line Rule¶
If your Bash script passes 100 lines, ask: "Is this still glue, or is this logic?" Glue connects programs. Logic transforms data, makes decisions, handles errors. Bash is great glue. Bash is terrible logic.
Three Signals It's Time to Switch¶
- You're building data structures.
declare -Aand naming conventions likehost_1_ip,host_1_port? Python's dicts and classes will save you hours. - You're parsing structured data. If you're piping
jqthroughawkback intojq, you're writing a bad Python script in Bash. - You need error recovery, not just detection.
set -eexits on failure.try/exceptcatches specific errors, retries, falls back, logs context, and continues.
Mental Model: Bash is a text stream processor. Everything is a string. Every tool communicates via text piped between processes. Python is a data structure processor. You parse text into objects once, then work with real types — lists, dicts, integers, booleans. The moment your Bash script starts doing math on strings or building data structures with associative arrays, you've crossed the line into Python territory.
Etymology: Python was created by Guido van Rossum in 1991 and named after Monty Python's Flying Circus, not the snake. The language's design philosophy is captured in "The Zen of Python" (
import this), which includes "Readability counts" and "There should be one — and preferably only one — obvious way to do it."
Part 2: Running the Thing¶
The REPL — Your New Scratch Terminal¶
# You already do this all day:
$ echo "hello"
hello
# Python has the same thing:
$ python3
>>> print("hello")
hello
>>> 2 + 2
4
>>> exit()
The >>> prompt is Python's interactive shell — the REPL (Read-Eval-Print Loop). It's your bash -c equivalent for testing one-liners.
Scripts and Shebangs¶
Same pattern as Bash: chmod +x script.py, run with ./script.py. The env trick finds whichever python3 is in your $PATH.
Gotcha: Outside a virtual environment, always use
python3— the barepythoncommand is not uniform across Unix systems (per PEP 394, it may point to Python 2, Python 3, or not exist at all). Inside an activated virtual environment,pythonis fine and usually preferred — the venv guarantees it points to the correct interpreter. Python 2 reached end of life on January 1, 2020.
Quick One-Liners from the Shell¶
# Version check
python3 -c "import sys; print(sys.version)"
# Pretty-print JSON (no jq needed)
python3 -m json.tool < file.json
# Instant HTTP file server
python3 -m http.server 8000
# Generate a random password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
# Base64 encode
python3 -c "import base64; print(base64.b64encode(b'secret').decode())"
# Check if a module is installed
python3 -c "import boto3; print(boto3.__version__)"
Part 3: Variables and Types¶
In Bash, everything is a string. In Python, data has types:
name = "webserver" # str (string)
count = 42 # int (integer)
uptime = 99.7 # float (decimal)
is_running = True # bool (boolean — capital T)
last_error = None # NoneType (like null — "nothing here")
No $ prefix. No quoting disasters. No declare -i. The value itself tells Python what type it is.
Type Conversion — Explicit Is Better¶
port = "8080" # String from a config file
port_int = int(port) # Now it's an integer
print(port_int + 1) # 8081
int("not_a_number") # ValueError — Python yells immediately
Python yells at you instead of silently doing the wrong thing. This is a feature.
Mental Model: Bash is text-in, text-out. You convert between types by piping strings through commands (
bc,awk,printf). In Python, data carries its type with it, and you convert explicitly withint(),str(),float().
Part 4: f-Strings¶
String formatting in Bash is a minefield of quoting rules. Python f-strings are the fix:
host = "web-03"
port = 8080
print(f"Connecting to {host}:{port}") # Variables
print(f"Status: {port + 1}") # Expressions
print(f"{'HOST':<20} {'PORT':>5}") # Alignment
print(f"Uptime: {99.734:.1f}%") # Decimal places
print(f"Size: {1048576:,} bytes") # Thousands separator
Anything inside {} in an f-string is a Python expression. Variables, math, function calls — all valid. No more escaping nested quotes inside $() inside double quotes inside heredocs.
History: f-strings were introduced in Python 3.6 (2016) via PEP 498. The older formats —
%formatting (from C'sprintf) and.format()— still work but are more verbose. f-strings won because they put the value right next to where it appears.
Part 5: Data Structures¶
Lists — Arrays That Actually Work¶
servers = ["web-01", "web-02", "db-01"]
print(len(servers)) # 3
print(servers[0]) # "web-01"
print(servers[-1]) # "db-01" (negative indexing!)
servers.append("cache-01") # Append
# Filter (list comprehension)
high_ports = [p for p in [80, 443, 8080, 9090] if p > 1024] # [8080, 9090]
# Check membership
if 443 in [80, 443, 8080]:
print("HTTPS is configured")
List comprehensions ([x for x in items if condition]) are Python's pipeline filters — they read left to right, same as cmd | grep condition.
Dicts — The Data Structure That Replaces Half Your Scripts¶
counts = {
"sshd": 47,
"cron": 12,
"kernel": 8,
}
print(counts["sshd"]) # 47
print(counts.get("nginx", 0)) # 0 (default if missing — no crash)
for program, count in counts.items():
print(f"{program}: {count}")
# Sort by value
for prog, n in sorted(counts.items(), key=lambda x: x[1], reverse=True):
print(f"{prog:<15} {n:>5}")
Key insight:
dict.get(key, default)returns the default instead of crashing on a missing key. This eliminates the most common dict-related bug.
Counter — The awk Killer¶
from collections import Counter
# The Bash way:
# awk '{print $5}' /var/log/syslog | cut -d'[' -f1 | sort | uniq -c | sort -rn
# The Python way:
counts = Counter()
with open("/var/log/syslog") as f:
for line in f:
parts = line.split()
if len(parts) >= 5:
program = parts[4].split("[")[0].rstrip(":")
counts[program] += 1
for program, n in counts.most_common(20):
print(f"{program:<20} {n:>6}")
Five Bash commands piped together → one Python data structure. And counts is an object you can query, filter, serialize to JSON, or combine with another Counter. The Bash pipeline gave you printed text. Python gave you data.
War Story: A team had a 120-line Bash script monitoring log volume across 15 services. It used nested
forloops, threedeclare -Aarrays, and had 9 bugs related to uninitialized array keys (Bash returns empty strings for missing keys, which silently breaks arithmetic). The Python rewrite used{service: Counter()}and was 40 lines. The three hardest bugs in the Bash version were impossible in Python becauseCounter()initializes missing keys to zero automatically.
defaultdict — Auto-Initializing Dicts¶
from collections import defaultdict
# Group log lines by status code
lines_by_status = defaultdict(list)
with open('access.log') as f:
for line in f:
status = line.split()[8]
lines_by_status[status].append(line.rstrip())
# No "if key not in dict" boilerplate needed
print(f"Unique 500 errors: {len(lines_by_status['500'])}")
Sets — Fast Membership Tests¶
seen = {"web-01", "web-02"}
if "web-01" in seen:
print("duplicate")
# Sets are O(1) lookup — use for membership tests and deduplication
# Dedup a list while preserving order (Python 3.7+)
hosts = ["web-01", "db-01", "web-01", "cache-01"]
unique = list(dict.fromkeys(hosts)) # ["web-01", "db-01", "cache-01"]
Tuples — Immutable Lists¶
# Tuples can't be changed after creation (immutable)
point = (10, 20)
host_port = ("web-01", 8080)
# Tuple unpacking — use instead of $1, $2
host, port = host_port
print(f"{host}:{port}")
# Multiple return values from functions
total, errors, warnings = analyze_log("/var/log/syslog")
Part 6: Control Flow¶
Conditionals — No More Bracket Roulette¶
if status == "running":
print("Service is up")
elif count > 10:
print("Above threshold")
else:
print("Something else")
No then. No fi. No semicolons. No brackets. Indentation defines the block.
Use is None for None checks, not == None. is tests identity, == tests equality. None is a singleton, so is is both faster and semantically correct:
result = get_server_status()
if result is None: # Correct
print("No response")
if result is not None: # Correct negation
print(f"Got: {result}")
Truthiness¶
# These are "falsy" (treated as False in if statements)
False, 0, 0.0, "", [], {}, None
# Everything else is "truthy"
# So you can write:
if servers: # True if list is non-empty
print("We have servers")
if name: # True if string is non-empty
print(f"Hello, {name}")
Loops¶
# Iterate a list
for server in ["web-01", "web-02", "db-01"]:
print(f"Checking {server}")
# Range
for i in range(10): # 0 through 9
print(i)
# enumerate — index + value (no manual counter)
for i, server in enumerate(servers):
print(f"{i}: {server}")
# zip — walk two lists in lockstep
for host, port in zip(["web-01", "db-01"], [80, 5432]):
print(f"{host}:{port}")
enumerate() and zip() are the two loop tools you'll use most.
File Reading¶
with open("/var/log/syslog") as f:
for line in f:
print(line, end="") # end="" because line already has \n
The with keyword is a context manager — it automatically closes the file when you're done, even if an error occurs. It's Python's version of trap cleanup EXIT, except scoped to one block.
Part 7: Functions¶
Bash functions return exit codes (0–255) and communicate via stdout. Python functions return actual values:
def get_disk_usage(path="/"):
import shutil
total, used, free = shutil.disk_usage(path)
return round(used / total * 100, 1) # Returns a float, not a string
usage = get_disk_usage() # No subshell. No text parsing.
print(f"Usage: {usage}%")
# Named arguments with defaults
def check_port(host, port, timeout=3):
import socket
try:
sock = socket.create_connection((host, port), timeout=timeout)
sock.close()
return True
except (ConnectionRefusedError, TimeoutError):
return False
# Multiple return values
def analyze_log(path):
errors = warnings = total = 0
with open(path) as f:
for line in f:
total += 1
if "ERROR" in line: errors += 1
elif "WARN" in line: warnings += 1
return total, errors, warnings
total, errors, warnings = analyze_log("/var/log/syslog")
| Feature | Bash | Python |
|---|---|---|
| Return data | echo + capture with $() |
return value |
| Return type | Always a string (stdout) | Any type: int, str, list, dict, bool |
| Arguments | Positional: $1, $2 |
Named, with defaults |
| Scope | Global by default (need local) |
Local by default |
| Multiple returns | Impossible | return a, b, c (tuple) |
Under the Hood: In Bash,
result=$(my_function)runs the function in a subshell. Variable changes inside$()disappear when the subshell exits. This is the #1 source of "why didn't my variable update?" bugs. Python functions share the same process and return data directly.
Type Hints and Dataclasses¶
# Type hints improve readability and tooling
def classify_load(value: float) -> str:
if value >= 10:
return "critical"
if value >= 5:
return "warning"
return "ok"
# Dataclasses for structured data
from dataclasses import dataclass, field
@dataclass
class Host:
name: str
address: str
port: int = 22
tags: list[str] = field(default_factory=list)
Use type hints on function boundaries first — that gets most of the value. Use dataclasses when the data shape matters (named fields, sane defaults, fewer typo bugs).
Mutable Default Arguments¶
# BAD: mutable default argument (shared across all calls!)
def add_host(name, tags=[]):
tags.append(name)
return tags
# GOOD: use None and create fresh
def add_host(name: str, tags: list[str] | None = None) -> list[str]:
tags = [] if tags is None else tags
tags.append(name)
return tags
The default [] is created once at function definition time and shared across every call. Call add_host("a") then add_host("b") and the second call returns ["a", "b"]. This is one of Python's most common footguns.
Part 8: File I/O¶
# Read entire file
with open("/etc/hostname") as f:
content = f.read().strip()
# Process line by line (memory-efficient for huge files)
with open("/etc/passwd") as f:
for line in f:
parts = line.strip().split(":")
print(f"{parts[0]:<20} {parts[-1]}")
# Write to a file
with open("/tmp/config.txt", "w") as f:
f.write("server=web-03\n")
f.write("port=8080\n")
| Mode | Meaning | Bash Equivalent |
|---|---|---|
"r" |
Read (default) | < file |
"w" |
Write (truncates!) | > file |
"a" |
Append | >> file |
Gotcha:
"w"mode truncates the file immediately on open — before you write anything. For atomic writes, write to a temp file then rename it (same pattern as safe config updates in Bash).
Atomic File Writes¶
import tempfile
from pathlib import Path
def atomic_write(path, content):
"""Write content to file atomically — safe for config files."""
path = Path(path)
fd, tmp_path = tempfile.mkstemp(dir=path.parent, suffix='.tmp')
try:
Path(tmp_path).write_text(content)
Path(tmp_path).rename(path) # Atomic on same filesystem
except Exception:
Path(tmp_path).unlink(missing_ok=True)
raise
Part 9: Error Handling¶
In Bash, error handling is set -e and prayer. Python has try/except — structured, specific, and reliable:
try:
with open("/var/log/syslog") as f:
content = f.read()
except FileNotFoundError:
print("Syslog not found (are you on macOS?)")
content = ""
except PermissionError:
print("Permission denied — run with sudo?")
content = ""
# Catch, log, and continue (what set -e can't do)
results = {}
for server in ["web-01", "web-02", "web-03"]:
try:
results[server] = check_server(server)
except ConnectionError as e:
print(f"WARN: {server} unreachable: {e}")
results[server] = None
# Script continues instead of dying
The Exception Hierarchy You Need¶
BaseException
└── Exception
├── FileNotFoundError # File doesn't exist
├── PermissionError # Can't read/write
├── ValueError # Wrong value (int("abc"))
├── KeyError # Dict key doesn't exist
├── IndexError # List index out of range
├── TypeError # Wrong type (len(42))
├── ConnectionError # Network failure
├── TimeoutError # Operation timed out
└── KeyboardInterrupt # Ctrl+C
try / except / else / finally¶
try:
f = open("/var/log/syslog")
data = f.read()
except FileNotFoundError:
print("File missing")
data = ""
else:
# Only runs if NO exception occurred
print(f"Read {len(data)} bytes")
finally:
# ALWAYS runs — cleanup goes here
print("Done")
block/rescue Equivalent¶
# Ansible-style block/rescue pattern
try:
deploy_application()
verify_health()
except Exception:
rollback_application()
notify_team("Deploy failed, rolled back")
finally:
log_deployment_attempt()
Mental Model:
set -eis a fire alarm — when something goes wrong, everybody evacuates.try/exceptis a fire extinguisher — you identify what's burning, put it out, and keep working.
Part 10: String Methods¶
Python strings have methods that replace most sed/awk/cut one-liners:
line = " Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password "
line.strip() # Remove leading/trailing whitespace
line.split() # Split on whitespace (like awk default)
line.split(":") # Split on colons (like cut -d':')
", ".join(["web-01", "web-02"]) # "web-01, web-02"
"sshd[28410]:".startswith("sshd") # True
"Hello World".replace("World", "Ops") # "Hello Ops"
"Failed password" in line # True (substring check)
"warning".upper() # "WARNING"
# Chaining — extract program name from syslog line
# Bash: echo "$line" | awk '{print $5}' | cut -d'[' -f1 | tr -d ':'
program = line.split()[4].split("[")[0].rstrip(":")
# "sshd"
One line, no pipes, no subprocesses, and it returns a Python string you can use directly in a dict, comparison, or f-string.
Fact: Python strings are immutable — every method returns a new string. The original is unchanged. This means you can never accidentally corrupt data by modifying it in two places at once.
Part 11: Imports and the Standard Library¶
# Standard library — ships with Python, no install required
import os # OS interactions (env vars, paths, PIDs)
import sys # Python runtime (args, exit, stdin/stdout)
import json # JSON parsing/writing
import re # Regular expressions
import subprocess # Run shell commands
import datetime # Dates and times
import collections # Counter, defaultdict
import socket # Low-level networking
import shutil # File copying, disk usage
import csv # CSV file reading/writing
import hashlib # Hashing (md5, sha256)
import argparse # CLI argument parsing
import logging # Structured logging
import tempfile # Temporary files
from pathlib import Path # File path operations
# Third-party — install with pip
import requests # HTTP requests (better curl)
import yaml # YAML parsing
import boto3 # AWS SDK
"Batteries included" — Python's standard library ships with modules for nearly everything. The phrase was coined in the late 1990s. On a server where you can't install packages, you still have
json,csv,re,pathlib,subprocess,logging, and more.
The Main Guard¶
#!/usr/bin/env python3
def main():
print("This only runs when executed directly")
if __name__ == "__main__":
main()
When Python runs a file directly, __name__ is "__main__". When imported as a module, it's the module name. This lets your file work as both a script and a reusable library.
Part 12: subprocess — The Escape Hatch¶
import subprocess
# Simple command
result = subprocess.run(
["df", "-h", "/"],
capture_output=True,
text=True,
check=True, # Raise on non-zero exit
)
print(result.stdout)
# Parse JSON output from a command
result = subprocess.run(
["kubectl", "get", "pods", "-o", "json"],
capture_output=True, text=True, check=True,
)
pods = json.loads(result.stdout)
# With timeout
result = subprocess.run(
["helm", "list"],
capture_output=True, text=True, timeout=30,
)
# Stream output in real time
process = subprocess.Popen(
["ansible-playbook", "site.yml"],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True,
)
for line in process.stdout:
print(line, end='')
The shell=True Footgun¶
# NEVER DO THIS with user input:
subprocess.run(f"ping -c 1 {hostname}", shell=True) # Shell injection!
# What if hostname is "google.com; rm -rf /"?
# SAFE: pass arguments as a list
subprocess.run(["ping", "-c", "1", hostname])
When to Use subprocess vs Native Python¶
| Use subprocess | Use native Python |
|---|---|
systemctl restart nginx |
Parsing a file (open()) |
iptables -L |
HTTP requests (requests) |
docker ps |
JSON parsing (json) |
git log --oneline |
File operations (pathlib) |
aws CLI (quick one-offs) |
String matching (re) |
Gotcha: The most common mistake when Bash experts start writing Python is calling
subprocess.run()for everything. If you're writingsubprocess.run(["grep", "ERROR", logfile]), you're paying Python's overhead without getting its benefit. Usefor line in open(logfile) if "ERROR" in line.
Part 13: pathlib — Files Without the Pain¶
from pathlib import Path
# Path manipulation (no more os.path.join())
config = Path("/etc/myapp") / "conf.d" / "upstream.yaml"
backup = config.with_suffix(".yaml.bak")
# Properties
config.parent # PosixPath('/etc/myapp/conf.d')
config.name # 'upstream.yaml'
config.stem # 'upstream'
config.suffix # '.yaml'
# Check existence
config.exists()
config.is_file()
config.is_dir()
# Read/write
content = config.read_text()
config.write_text("new content")
# Create directories
Path("/backup/myapp").mkdir(parents=True, exist_ok=True)
# Find files (like find command)
for log in Path("/var/log").glob("*.log"):
size_mb = log.stat().st_size / (1024 * 1024)
if size_mb > 100:
print(f"Large log: {log} ({size_mb:.1f} MB)")
# Recursive glob
for yaml_file in Path("/etc").rglob("*.yaml"):
print(yaml_file)
Trivia: The
/operator for paths was added in Python 3.4 (2014). It works by overriding__truediv__, the same method that handlesa / bfor numbers.
Part 14: JSON and YAML¶
JSON¶
import json
# Parse from string
data = json.loads('{"status": "healthy", "uptime": 84600}')
# Parse from file
with open("response.json") as f:
data = json.load(f)
# Write (pretty-printed)
print(json.dumps(data, indent=2))
# Navigate nested structures
pod_name = data["metadata"]["name"]
node = data["status"].get("hostIP", "unknown") # Safe with default
YAML¶
import yaml # pip install pyyaml
# Read a Kubernetes manifest
with open("deployment.yaml") as f:
manifest = yaml.safe_load(f)
# Multi-document YAML (multiple --- separated docs)
with open("all-resources.yaml") as f:
for doc in yaml.safe_load_all(f):
if doc:
print(f"{doc.get('kind')}: {doc['metadata']['name']}")
Security: Always use
yaml.safe_load(), neveryaml.load(). The unsafe version can execute arbitrary Python code embedded in YAML. This is a known attack vector, not a theoretical risk. If you seeyaml.load(f)without aLoaderargument, that's a security bug.
YAML's Type Surprises¶
gotchas = yaml.safe_load("""
norway_code: NO # boolean False (YAML 1.1!)
version: 1.10 # float 1.1 (trailing zero dropped!)
port: 8080 # integer (fine)
""")
# "NO" becomes False, "1.10" becomes 1.1
# Always quote strings that could be misinterpreted
The Norway Problem: Country code
NObeing parsed as booleanFalsehas caused real deployment failures. YAML 1.2 fixed this, but PyYAML still implements YAML 1.1.
TOML¶
Python 3.11+ includes tomllib in the standard library for reading TOML files (used by pyproject.toml):
import tomllib
with open("pyproject.toml", "rb") as f:
config = tomllib.load(f)
print(config["project"]["name"])
Note:
tomllibis read-only. If you need to write TOML, use the third-partytomli-wpackage.
Part 15: requests — The Better curl¶
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Basic GET
response = requests.get("http://api.internal:8080/health", timeout=10)
print(response.status_code) # 200
print(response.json()) # Parsed JSON as a dict
# POST with JSON body
response = requests.post(
"https://api.example.com/deploy",
json={"version": "v2.0", "env": "prod"},
headers={"Authorization": "Bearer mytoken"},
timeout=10,
)
response.raise_for_status() # Raises HTTPError for 4xx/5xx
Sessions with Retries — The Non-Negotiable Pattern¶
def get_session(retries=3, backoff_factor=0.5, timeout=10):
"""Create a requests session with automatic retries."""
session = requests.Session()
retry = Retry(
total=retries,
backoff_factor=backoff_factor, # 0.5s, 1s, 2s between retries
status_forcelist=[500, 502, 503, 504],
allowed_methods=["GET", "HEAD"],
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
session = get_session()
data = session.get("http://prometheus.internal:9090/api/v1/targets", timeout=10).json()
War Story: A monitoring script checked 40 endpoints every 60 seconds with no retry logic. During a 3-second 502 from a routine deploy, it fired 40 "service down" alerts to Slack and paged the on-call at 3 AM. After adding retries with 2-second backoff, false alerts dropped 90%.
Gotcha:
requests.get(url)with no timeout blocks forever if the server doesn't respond. Your cron job piles up. You now have 30 zombie Python processes. Always settimeout=.
Part 16: Regular Expressions¶
import re
# Simple match (like grep)
if re.search(r"Failed password", line):
print("SSH failure detected")
# Extract groups (like sed capture groups)
match = re.search(r"from (\d+\.\d+\.\d+\.\d+) port (\d+)", line)
if match:
ip = match.group(1)
port = match.group(2)
# Find all matches
ips = re.findall(r"\d+\.\d+\.\d+\.\d+", log_content)
# Replace (like sed 's/old/new/g')
cleaned = re.sub(r"\s+", " ", messy_text)
# Compile for performance (reuse the pattern)
LOG_PATTERN = re.compile(r"^(\S+) - - \[(.+?)\] \"(\S+) (\S+)")
for line in open("access.log"):
match = LOG_PATTERN.match(line)
if match:
ip, timestamp, method, path = match.groups()
Part 17: Logging¶
import logging
import sys
def setup_logging(verbose=False):
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format='%(asctime)s %(levelname)s %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
handlers=[logging.StreamHandler(sys.stderr)],
)
return logging.getLogger(__name__)
log = setup_logging()
log.info("Starting backup for %d hosts", len(hosts))
log.warning("Host %s unreachable: %s", host, error)
log.error("Backup failed: %s", str(e))
JSON Logging (for Monitoring Pipelines)¶
import json
from datetime import datetime
def log_json(event, **kwargs):
entry = {'event': event, 'timestamp': datetime.now(datetime.UTC).isoformat()}
entry.update(kwargs)
print(json.dumps(entry), file=sys.stderr)
log_json("backup_complete", hosts=5, duration_seconds=142)
Note:
datetime.utcnow()is deprecated in Python 3.12+. Usedatetime.now(datetime.UTC)instead — it returns a timezone-aware datetime, which prevents a whole class of "naive vs aware" comparison bugs.
Part 18: CLI Tools¶
argparse (Standard Library)¶
#!/usr/bin/env python3
"""Morning infrastructure health check."""
import argparse
import os
def main():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('--env', required=True, choices=['dev', 'staging', 'prod'])
parser.add_argument('--dry-run', action='store_true')
parser.add_argument('--verbose', '-v', action='count', default=0)
parser.add_argument('--timeout', type=int,
default=int(os.environ.get('TIMEOUT', '30')))
args = parser.parse_args()
# args.env, args.dry_run, args.verbose, args.timeout
if __name__ == '__main__':
main()
Click (Third-Party, More Powerful)¶
import click
@click.group()
@click.option('--verbose', '-v', is_flag=True)
@click.pass_context
def cli(ctx, verbose):
"""Infrastructure management tool."""
ctx.ensure_object(dict)
ctx.obj['verbose'] = verbose
@cli.command()
@click.argument('environment', type=click.Choice(['dev', 'staging', 'prod']))
@click.option('--region', '-r', default='us-east-1')
def list_servers(environment, region):
"""List servers in an environment."""
servers = get_instances_by_tag('Environment', environment)
for s in servers:
click.echo(f"{s['id']:<22} {s['ip']:<16} {s['type']}")
@cli.command()
@click.argument('instance_id')
@click.confirmation_option(prompt='Stop this instance?')
def stop(instance_id):
"""Stop an EC2 instance."""
stop_instance(instance_id)
Configuration Precedence Pattern¶
This matches how every serious CLI tool (kubectl, aws, terraform) works.
Part 19: boto3 — Automating AWS¶
import boto3
from botocore.exceptions import ClientError
ec2 = boto3.client('ec2', region_name='us-east-1')
s3 = boto3.client('s3')
# List instances by tag
def get_instances_by_tag(tag_key, tag_value):
response = ec2.describe_instances(
Filters=[
{'Name': f'tag:{tag_key}', 'Values': [tag_value]},
{'Name': 'instance-state-name', 'Values': ['running']},
]
)
instances = []
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instances.append({
'id': instance['InstanceId'],
'ip': instance.get('PrivateIpAddress'),
'type': instance['InstanceType'],
})
return instances
# CRITICAL: Paginate all list operations
def list_all_s3_objects(bucket, prefix=''):
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
for obj in page.get('Contents', []):
yield obj['Key'], obj['Size']
# Error handling
try:
ec2.stop_instances(InstanceIds=[instance_id])
except ClientError as e:
if e.response['Error']['Code'] == 'InvalidInstanceID.NotFound':
print(f"Instance {instance_id} not found")
else:
raise
Gotcha: boto3 reads credentials in this order: (1) explicit parameters, (2) environment variables, (3)
~/.aws/credentials, (4) EC2 instance metadata / ECS task role. Never hardcode credentials in code.Gotcha: AWS APIs return at most 100-1000 results per call. If you have 5,000 instances and don't paginate, you only see the first 1,000. Use paginators for every
describe_*,list_*,get_*call.Trivia: boto3 is the most-used AWS SDK in any language, with over 1 billion downloads per month from PyPI.
Part 20: paramiko — SSH from Python¶
import paramiko
def run_remote_command(host, user, key_path, command):
"""Run a command on a remote host via SSH."""
client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
try:
client.connect(hostname=host, username=user,
key_filename=key_path, timeout=10)
stdin, stdout, stderr = client.exec_command(command, timeout=30)
exit_code = stdout.channel.recv_exit_status()
return {
'host': host,
'stdout': stdout.read().decode().strip(),
'stderr': stderr.read().decode().strip(),
'exit_code': exit_code,
}
finally:
client.close() # Always close, even on exception
Gotcha: If you don't close paramiko connections on exceptions, after 200 hosts you hit the file descriptor limit. Always use
try/finally.Security: In production, avoid
AutoAddPolicy()— it accepts any host key without verification, making you vulnerable to MITM attacks. UseRejectPolicy()orWarningPolicy()and manage known hosts properly (e.g.,client.load_system_host_keys()).
Part 21: Jinja2 — Templating Config Files¶
from jinja2 import Template, Environment, FileSystemLoader
# Inline template
tmpl = Template("Hello {{ name }}")
print(tmpl.render(name="world"))
# From files
env = Environment(loader=FileSystemLoader('templates/'),
trim_blocks=True, lstrip_blocks=True)
tmpl = env.get_template('nginx.conf.j2')
config = tmpl.render(
service_name='myapp',
backends=[
{'ip': '10.0.1.10', 'port': 8080, 'weight': 100},
{'ip': '10.0.1.11', 'port': 8080, 'weight': 100},
],
domain='app.example.com',
)
# K8s manifest generation
K8S_TEMPLATE = Template("""
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ name }}
spec:
replicas: {{ replicas }}
template:
spec:
containers:
- name: {{ name }}
image: {{ image }}:{{ tag }}
""")
for svc in services:
print("---")
print(K8S_TEMPLATE.render(**svc))
Part 22: Concurrency¶
The GIL (Global Interpreter Lock)¶
The GIL allows only one thread to execute Python bytecode at a time. But it's released during I/O (network, disk, sleep). So:
| Workload | GIL Impact | Right Tool |
|---|---|---|
| I/O-bound (HTTP, SSH, file I/O) | Minimal | threading or asyncio |
| CPU-bound (math, parsing) | Severe — threads give zero speedup | multiprocessing |
Key fact for DevOps: Infrastructure scripts are almost always I/O-bound. The GIL does not matter for your work. Threading works great.
ThreadPoolExecutor — Parallel Fleet Operations¶
from concurrent.futures import ThreadPoolExecutor, as_completed
def check_host_health(host):
try:
resp = requests.get(f'http://{host}:8080/health', timeout=5)
return {'host': host, 'healthy': resp.ok}
except requests.exceptions.RequestException as e:
return {'host': host, 'healthy': False, 'error': str(e)}
def parallel_health_check(hosts, max_workers=20):
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_host = {
executor.submit(check_host_health, host): host
for host in hosts
}
for future in as_completed(future_to_host):
results.append(future.result())
return results
# 200 hosts checked in parallel — seconds instead of minutes
results = parallel_health_check(all_hosts, max_workers=30)
unhealthy = [r for r in results if not r['healthy']]
Fleet Operation Pattern¶
def fleet_operation(hosts, operation, max_workers=20, fail_fast=False):
"""Run an operation across a fleet of hosts in parallel."""
results = {'success': [], 'failed': []}
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(operation, host): host for host in hosts}
for future in as_completed(futures):
host = futures[future]
try:
result = future.result()
results['success'].append({'host': host, 'result': result})
except Exception as e:
results['failed'].append({'host': host, 'error': str(e)})
if fail_fast:
executor.shutdown(wait=False, cancel_futures=True)
break
return results
Part 23: Data Wrangling — Log Parsing at Scale¶
Reading Large Files Without Killing the Server¶
# BAD: loads 800 MB into RAM (plus ~3x object overhead)
lines = open('access.log').readlines()
# GOOD: streams line by line — constant memory
with open('access.log') as f:
for line in f:
process(line)
# Gzipped files
import gzip
with gzip.open('access.log.gz', 'rt', errors='replace') as f:
for line in f:
process(line)
Multi-Aggregation in One Pass¶
from collections import Counter
from pathlib import Path
import gzip
ip_counts = Counter()
status_counts = Counter()
endpoint_counts = Counter()
for log_file in sorted(Path('/var/log/nginx').glob('access.log.*.gz')):
with gzip.open(log_file, 'rt', errors='replace') as f:
for line in f:
parts = line.split()
if len(parts) >= 10:
ip_counts[parts[0]] += 1
status_counts[parts[8]] += 1
endpoint_counts[parts[6]] += 1
# Three aggregations, one pass, constant memory
print("Top IPs:", ip_counts.most_common(10))
print("Status codes:", status_counts.most_common())
Generators for Memory-Efficient Pipelines¶
def grep_file(filepath, pattern):
"""Memory-efficient file search — like grep but returns structured data."""
with open(filepath) as f:
for line_num, line in enumerate(f, 1):
if pattern in line:
yield line_num, line.rstrip()
# Lazy evaluation — only processes what's needed
for num, line in grep_file("/var/log/syslog", "ERROR"):
print(f"{num}: {line}")
Part 24: Kubernetes Client¶
from kubernetes import client, config # pip install kubernetes
# Load kubeconfig
try:
config.load_incluster_config() # Inside a pod
except config.ConfigException:
config.load_kube_config() # From ~/.kube/config
v1 = client.CoreV1Api()
# List pods
pods = v1.list_namespaced_pod('default')
for p in pods.items:
print(f"{p.metadata.name}: {p.status.phase}")
# Find CrashLoopBackOff pods across all namespaces
all_pods = v1.list_pod_for_all_namespaces()
for pod in all_pods.items:
for cs in (pod.status.container_statuses or []):
waiting = cs.state.waiting
if waiting and waiting.reason == 'CrashLoopBackOff':
print(f" {pod.metadata.namespace}/{pod.metadata.name} "
f"({cs.name}) - {cs.restart_count} restarts")
Part 25: Debugging¶
print() — The Universal First Step¶
breakpoint() — The Built-In Debugger¶
def process_data(records):
for record in records:
breakpoint() # Drops into pdb interactive debugger
transform(record)
Essential pdb Commands¶
| Command | Short | What It Does |
|---|---|---|
next |
n |
Execute next line (step over) |
step |
s |
Step into function call |
continue |
c |
Continue until next breakpoint |
print expr |
p expr |
Print expression value |
list |
l |
Show source code around current line |
where |
w |
Show call stack |
quit |
q |
Quit debugger |
# Disable all breakpoints via environment
PYTHONBREAKPOINT=0 python3 script.py
# Use ipdb instead of pdb (better UI)
PYTHONBREAKPOINT=ipdb.set_trace python3 script.py
Remote Debugging (Docker, Production)¶
import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger...")
debugpy.wait_for_client()
Profiling¶
# Find slow functions
python3 -m cProfile -s cumtime myscript.py 2>&1 | head -30
# Check syntax without running
python3 -m py_compile myscript.py
Part 26: Virtual Environments and Packaging¶
Virtual Environments¶
# Create
python3 -m venv .venv
# Activate
source .venv/bin/activate
# Install packages (goes into .venv only)
pip install requests boto3
# Freeze dependencies
pip freeze > requirements.txt
# Deactivate
deactivate
A venv is just a directory. Delete it and you're clean. Never commit .venv/ to git.
requirements.txt¶
# Direct dependencies (loose — for libraries)
requests>=2.28
flask>=3.0
# Pinned (reproducible — for applications)
requests==2.31.0
flask==3.0.2
Werkzeug==3.0.1
pip-tools — The Better Way¶
pip install pip-tools
# requirements.in — what you WANT
cat requirements.in
# requests>=2.28
# flask>=3.0
# Compile to pinned requirements.txt — what you GET
pip-compile requirements.in
# Install exactly what's pinned
pip-sync requirements.txt
pyproject.toml (Modern Standard)¶
[project]
name = "my-infra-tool"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
"requests>=2.28",
"boto3>=1.28",
"click>=8.0",
]
[project.scripts]
infra-check = "my_tool.cli:main"
uv — The Future¶
# uv is a Rust-based replacement for pip, pip-tools, virtualenv, and pyenv
# 10-100x faster than pip
pip install uv
uv pip install requests
uv venv .venv
uv pip compile requirements.in
Trivia:
pipdidn't exist until 2008. Before that, packages were installed witheasy_install, which couldn't even uninstall packages. The name "pip" is recursive: "pip installs packages."
Part 27: Testing¶
# test_health.py
import pytest
def test_parse_health_response():
response = {"status": "healthy", "uptime": 84600}
assert response["status"] == "healthy"
assert response["uptime"] > 0
def test_parse_unhealthy_response():
response = {"status": "degraded", "errors": ["disk_full"]}
assert response["status"] != "healthy"
assert len(response["errors"]) > 0
def test_missing_key_uses_default():
response = {}
status = response.get("status", "unknown")
assert status == "unknown"
Part 28: Footguns — Mistakes That Turn Automation Into Liability¶
1. Hardcoding AWS Credentials¶
Your aws_access_key_id in a Python file gets committed to Git. Someone runs trufflehog. AWS sends you a bill for 200 GPU instances mining crypto.
Fix: Use environment variables, AWS profiles, or IAM roles. boto3 checks ~/.aws/credentials and instance metadata automatically.
2. No Timeout on HTTP Requests¶
requests.get(url) with no timeout blocks forever. Your cron job piles up. 30 zombie Python processes consuming memory.
Fix: Always timeout=(5, 30) — 5s to connect, 30s to read.
3. subprocess with shell=True and User Input¶
Shell injection vulnerability. A hostname containing ; rm -rf / gets executed.
Fix: Pass arguments as a list: subprocess.run(["ping", "-c", "1", hostname]).
4. Not Paginating AWS API Calls¶
Script works in dev (10 instances), returns wrong results in prod (2,000 instances) — only first 1,000 visible.
Fix: Use paginators for every AWS list operation.
5. Non-Atomic File Writes¶
Process killed mid-write → half-written config → service crashes.
Fix: Write to temp file, then rename. rename() on same filesystem is atomic on Linux.
6. Catching Exception Instead of Specific Exceptions¶
except Exception: pass silently swallows NameErrors, ConnectionErrors, everything. Script reports success when it did nothing.
Fix: Catch specific exceptions. Let unexpected ones crash loudly.
7. Sequential Fleet Operations¶
SSHing into 500 servers one at a time. Each takes 2-3 seconds. Script takes 25 minutes.
Fix: ThreadPoolExecutor(max_workers=20). Same operation in under a minute.
8. Loading Entire Large Files into Memory¶
f.readlines() on a 5 GB file. Python uses ~15 GB (object overhead). OOM killer terminates your app.
Fix: Stream line by line: for line in open(path).
9. Using os.system Instead of subprocess¶
os.system("systemctl restart nginx") — can't capture stdout, can't get exit code reliably, can't handle arguments safely.
Fix: subprocess.run() with capture_output=True and check=True.
10. No Logging in Automation Scripts¶
Script runs via cron, fails. Nobody knows because it only printed to stdout and nobody reads root's email. Failing for 2 weeks.
Fix: Use logging module. Log to stderr with timestamps and severity levels.
11. yaml.load() Instead of yaml.safe_load()¶
Security vulnerability — can execute arbitrary Python code from YAML. Common audit finding.
Fix: Always yaml.safe_load().
Glossary¶
| Term | Definition |
|---|---|
| Python | Interpreted, dynamically-typed language. Named after Monty Python, not the snake |
| REPL | Read-Eval-Print Loop — the >>> interactive prompt |
| f-string | Formatted string literal: f"Hello {name}" (Python 3.6+) |
| list | Ordered, mutable collection: [1, 2, 3] |
| dict | Key-value mapping: {"host": "web-01", "port": 80} |
| tuple | Ordered, immutable collection: (1, 2, 3) |
| Counter | Dict subclass for counting: Counter(words).most_common(10) |
| defaultdict | Dict with auto-initialized missing keys |
| list comprehension | Inline list creation: [x for x in items if condition] |
| generator | Lazy iterator using yield — processes one item at a time |
| exception | Error object for control flow: try/except |
| context manager | with statement — guarantees cleanup (file close, lock release) |
| decorator | Function wrapper: @retry(max_attempts=3) |
| venv | Virtual environment — isolated per-project dependencies |
| pip | Package installer: pip install requests |
| module | Importable Python file |
| package | Directory of modules (has __init__.py) |
| GIL | Global Interpreter Lock — one thread runs Python bytecode at a time. Released during I/O |
| breakpoint() | Built-in to enter the debugger (Python 3.7+) |
| pdb | Python's built-in interactive debugger |
| pathlib | Object-oriented file path handling: Path("/etc") / "nginx" |
| boto3 | AWS SDK for Python |
| requests | HTTP library — the better curl |
| PyYAML | YAML parser. Always use safe_load() |
| Jinja2 | Templating engine: {{ variable }}, {% for %} |
| Click | Decorator-based CLI framework |
| argparse | Standard library CLI argument parser |
| subprocess | Run shell commands from Python. Never shell=True with user input |
| idempotent | Safe to run multiple times without changing result |
| timeout | Upper bound on waiting — prevents hangs. Always set one |
Trivia and History¶
-
Named after comedy, not a snake. Guido van Rossum named Python after Monty Python's Flying Circus. The docs use "spam," "eggs," and "ham" as variable names (from the Monty Python sketch) instead of "foo" and "bar."
-
Christmas 1989 hobby project. Guido started Python during Christmas week 1989 as a successor to the ABC language. First public release (0.9.0) came in February 1991.
-
The Benevolent Dictator. Guido held the title "Benevolent Dictator for Life" (BDFL) until he resigned in July 2018 after the contentious PEP 572 (walrus operator
:=) debate. Python is now governed by a five-person Steering Council. -
The Zen of Python. Type
import thisin a Python interpreter to see 19 aphorisms by Tim Peters (PEP 20, 1999). The 20th was intentionally left blank. -
Indentation by design. Python's significant whitespace was deliberate, inspired by ABC and Donald Knuth's literate programming. Guido argued that since programmers indent anyway, the language should enforce it.
-
The GIL controversy. The Global Interpreter Lock (added in 1992) prevents true multi-threaded parallelism. PEP 703 (accepted 2023) began the multi-year project to make it optional (expected ~Python 3.15+).
-
The 12-year migration. Python 3.0 was released December 2008. Python 2.7 was sunset January 1, 2020 — a 12-year transition that became a cautionary tale about breaking backward compatibility.
-
Python was infra before it was web. Guido created Python as a system administration scripting language in 1991. Web frameworks (Django 2005, Flask 2010) came much later. Python's first major use was file management and system scripting.
-
Ansible's secret weapon. Ansible chose Python because it's installed by default on virtually every Linux distribution. Modules execute using the system Python — no agent needed. This "agentless" architecture was only possible because of Python's ubiquity.
-
subprocess replaced five modules. Python's
subprocess(2004) unifiedos.system,os.spawn*,os.popen*,popen2.*, andcommands.*. Despite this,os.system()still appears in code written in 2025. -
The GIL doesn't matter for infra. Infrastructure scripts are I/O-bound (waiting for SSH, APIs, files). I/O-bound code benefits from threading even with the GIL.
-
pip didn't exist until 2008. Before pip,
easy_installcouldn't even uninstall packages. "pip" = "pip installs packages" (recursive acronym). -
Click powers most modern CLI tools. Click (2014) replaced argparse as the go-to for Python CLIs. AWS CLI v2, Datasette, and hundreds of DevOps tools use it.
-
Python replaced Perl. Perl's TIMTOWTDI ("There's More Than One Way To Do It") lost to Python's "There should be one obvious way." Readability won.
-
uv is rewriting Python tooling in Rust. uv (2024, by Astral/Ruff creators) is 10-100x faster than pip. Replaces pip, pip-tools, virtualenv, and pyenv.
Flashcard Review¶
Basics¶
| Q | A |
|---|---|
| What is Python (one line)? | Interpreted, dynamically-typed language. Named after Monty Python, runs everywhere |
| How do you run a Python script? | python3 script.py or ./script.py with shebang #!/usr/bin/env python3 |
| What are Python's basic types? | str, int, float, bool, None |
| What is an f-string? | Formatted string: f"Hello {name}" — any expression inside {} |
if __name__ == "__main__": — what does it do? |
Runs code only when file is executed directly, not when imported |
Data Structures¶
| Q | A |
|---|---|
| list vs tuple? | List is mutable [1,2,3], tuple is immutable (1,2,3) |
What is dict.get(key, default)? |
Returns default instead of raising KeyError on missing key |
What does Counter.most_common(10) return? |
List of (key, count) tuples, sorted by count descending |
What does defaultdict(list) do? |
Auto-creates empty list for missing keys — no "if key not in dict" needed |
| What is a list comprehension? | Inline filter/transform: [x for x in items if condition] |
Operations¶
| Q | A |
|---|---|
try/except vs Bash set -e? |
try/except catches specific errors with recovery. set -e just exits |
with open(file) as f: — why with? |
Guarantees file is closed even on exceptions (context manager) |
subprocess.run() — why pass a list not a string? |
Avoids shell injection. shell=True with user input is a security hole |
Why always set timeout= on requests? |
Without it, the call blocks forever if the server doesn't respond |
yaml.safe_load() vs yaml.load()? |
safe_load prevents code execution from YAML. load is a security vulnerability |
Infrastructure¶
| Q | A |
|---|---|
| How does boto3 find credentials? | Explicit params → env vars → ~/.aws/credentials → instance metadata |
| Why must you paginate AWS API calls? | APIs return max 100-1000 results. Without pagination, you miss the rest |
requests.get() vs curl? |
requests gives you sessions, retries, JSON parsing, proper error types |
| When do you use threading vs multiprocessing? | Threading for I/O-bound (HTTP, SSH). Multiprocessing for CPU-bound (math) |
What does ThreadPoolExecutor(max_workers=20) do? |
Runs up to 20 tasks in parallel using a thread pool |
Debugging and Packaging¶
| Q | A |
|---|---|
| How do you enter the Python debugger? | breakpoint() in code, or python3 -m pdb script.py |
| What is a virtual environment? | Isolated Python installation with its own packages. Created with python3 -m venv .venv |
What does pip freeze do? |
Dumps all installed packages with exact versions |
| What is pip-tools? | Separates what you want (requirements.in) from what you get (requirements.txt) |
Drills¶
Drill 1: Parse JSON API Response (Easy)¶
Q: Write a Python one-liner to fetch a URL and pretty-print the JSON response.
Answer
Drill 2: Read and Filter YAML (Easy)¶
Q: Read a Kubernetes YAML file and print all container image names.
Answer
Drill 3: subprocess Safely (Easy)¶
Q: Run kubectl get pods -o json from Python and list pod names with their status.
Answer
Key: list args (not string), `capture_output=True`, `check=True`, `text=True`.Drill 4: pathlib File Processing (Easy)¶
Q: Find all .yaml files in a directory tree and count total lines.
Answer
Drill 5: Environment Variables with Validation (Easy)¶
Q: Read config from environment variables with defaults and validation.
Answer
import os, sys
def require_env(name):
val = os.environ.get(name)
if not val:
print(f"ERROR: {name} required", file=sys.stderr)
sys.exit(1)
return val
config = {
'db_host': os.environ.get('DB_HOST', 'localhost'),
'db_port': int(os.environ.get('DB_PORT', '5432')),
'db_name': require_env('DB_NAME'),
'debug': os.environ.get('DEBUG', 'false').lower() == 'true',
}
Drill 6: HTTP Health Check (Medium)¶
Q: Check health endpoints for multiple services and exit non-zero if any fail.
Answer
import urllib.request, sys
SERVICES = {
'api': 'http://localhost:8080/health',
'frontend': 'http://localhost:3000/health',
}
failures = []
for name, url in SERVICES.items():
try:
req = urllib.request.urlopen(url, timeout=5)
status = "OK" if req.getcode() == 200 else "FAIL"
except Exception:
status = "FAIL"
failures.append(name)
print(f" {name}: {status}")
sys.exit(1 if failures else 0)
Drill 7: Log Parsing with Counter (Medium)¶
Q: Parse nginx access logs and report top 10 IPs.
Answer
Drill 8: Jinja2 Templating (Medium)¶
Q: Generate Kubernetes manifests from a template for multiple services.
Answer
from jinja2 import Template
TMPL = Template("""apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ name }}
spec:
replicas: {{ replicas }}
template:
spec:
containers:
- name: {{ name }}
image: {{ image }}:{{ tag }}
""")
for svc in [
{'name': 'api', 'replicas': 3, 'image': 'myapp/api', 'tag': 'v2.1'},
{'name': 'worker', 'replicas': 2, 'image': 'myapp/worker', 'tag': 'v2.1'},
]:
print("---")
print(TMPL.render(**svc))
Drill 9: Retry Decorator (Medium)¶
Q: Write a retry decorator with exponential backoff.
Answer
import time, functools
def retry(max_attempts=3, base_delay=1, backoff_factor=2):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(1, max_attempts + 1):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_attempts:
raise
delay = base_delay * (backoff_factor ** (attempt - 1))
print(f"Attempt {attempt} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
return wrapper
return decorator
@retry(max_attempts=3, base_delay=2)
def call_api():
return requests.get(url, timeout=10).json()
Drill 10: Kubernetes CrashLoopBackOff Detection (Hard)¶
Q: Use the Python Kubernetes client to find all CrashLoopBackOff pods.
Answer
from kubernetes import client, config
try:
config.load_incluster_config()
except config.ConfigException:
config.load_kube_config()
v1 = client.CoreV1Api()
for pod in v1.list_pod_for_all_namespaces().items:
for cs in (pod.status.container_statuses or []):
waiting = cs.state.waiting
if waiting and waiting.reason == 'CrashLoopBackOff':
print(f" {pod.metadata.namespace}/{pod.metadata.name} "
f"({cs.name}) - {cs.restart_count} restarts")
Drill 11: Translate a Bash Pipeline (Medium)¶
Q: Translate to Python: cat /etc/passwd | grep -v '^#' | awk -F: '$7 !~ /nologin|false/ {print $1, $7}' | sort
Answer
results = []
with open("/etc/passwd") as f:
for line in f:
line = line.strip()
if line.startswith("#") or not line:
continue
parts = line.split(":")
if len(parts) >= 7:
user, shell = parts[0], parts[6]
if "nologin" not in shell and "false" not in shell:
results.append((user, shell))
for user, shell in sorted(results):
print(f"{user} {shell}")
Cheat Sheet¶
Bash → Python Rosetta Stone¶
| Bash | Python | Notes |
|---|---|---|
$var |
var |
No prefix, no quoting |
echo "$var" |
print(f"{var}") |
f-strings |
${#string} |
len(string) |
Works on lists, dicts, strings |
$((x + 1)) |
x + 1 |
Math is native |
[[ $a == $b ]] |
a == b |
No brackets |
[ -f file ] |
Path(file).is_file() |
from pathlib import Path |
declare -A |
mydict = {} |
First-class data structure |
for x in ...; do |
for x in ...: |
Colon, not semicolon-do |
while read line |
for line in f: |
File iteration |
func() { ... } |
def func(): ... |
Indentation, not braces |
$1, $2 |
Named params | def f(host, port): |
$(cmd) |
subprocess.run(...) |
Prefer native Python |
cmd \| grep \| awk |
for/if/split |
Data stays in-process |
set -e |
try/except |
Per-operation, specific |
exit 1 |
sys.exit(1) |
Or raise exception |
source file.sh |
import module |
Namespaced |
sort \| uniq -c |
Counter() |
from collections import Counter |
curl |
requests |
Sessions, retries, JSON |
jq |
json module |
Native data structures |
find -name '*.log' |
Path.rglob('*.log') |
Returns Path objects |
mktemp + trap EXIT |
with tempfile: |
Cleanup guaranteed |
Quick Commands¶
# Pretty-print JSON
python3 -m json.tool < file.json
# HTTP server
python3 -m http.server 8000
# Check syntax
python3 -m py_compile script.py
# Profile performance
python3 -m cProfile -s cumtime script.py | head -30
# Create venv
python3 -m venv .venv && source .venv/bin/activate
# Generate password
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
Self-Assessment¶
Core Language¶
- I can write and run a Python script with a shebang
- I understand types (str, int, float, bool, None) and explicit conversion
- I can use f-strings for formatted output
- I can use lists, dicts, tuples, and Counter
- I understand list comprehensions
- I can write functions with named arguments and defaults
- I can use
with open()for file I/O - I can use
try/exceptfor specific error handling - I know when Bash is still the right tool vs when to switch to Python
Infrastructure Libraries¶
- I can use
subprocess.run()safely (list args, check=True, no shell=True) - I can use
pathlibfor file operations instead ofos.path - I can parse JSON and YAML (with
safe_load) - I can use
requestswith sessions, retries, and timeouts - I can write a CLI tool with argparse or Click
Cloud and Automation¶
- I can use boto3 with paginators and error handling
- I can run parallel operations with ThreadPoolExecutor
- I can use the Kubernetes Python client
- I can generate config files with Jinja2
- I understand the GIL and when threading vs multiprocessing applies
Production Readiness¶
- I use
logginginstead of print for production scripts - I can set up virtual environments and pin dependencies
- I can use
breakpoint()and pdb for debugging - I know the major footguns (no timeout, shell=True, yaml.load, no pagination)
- I can write basic pytest tests for my scripts
Related Lessons¶
- Python — Zero to Script for the Terminal Native — Ground-zero lesson with syslog parsing mission
- Python for Ops — The Bash Expert's Bridge — Deep Bash-to-Python translation guide
- Python Data Wrangling for Ops — Log parsing and data transformation at scale
- Python — Automating Everything: APIs and Infrastructure — Building a morning check tool with K8s, Prometheus, AWS, and Slack