Portal | Level: L1: Foundations | Topics: Python Debugging, Python Automation | Domain: DevOps & Tooling
Python Debugging - Primer¶
Why This Matters¶
Every DevOps engineer writes Python: deployment scripts, monitoring integrations, API clients, configuration generators. When those scripts fail in production at 3 AM, you need debugging skills that go beyond adding print() statements. Understanding Python's debugging tools — from pdb to profilers to production tracing — is the difference between a 10-minute fix and a 4-hour guessing game.
pdb: The Built-in Debugger¶
Under the hood: The
breakpoint()builtin (PEP 553, Python 3.7) callssys.breakpointhook(), which defaults topdb.set_trace(). You can swap the debugger entirely via thePYTHONBREAKPOINTenvironment variable — set it toipdb.set_trace,pudb.set_trace, or0to disable all breakpoints. This makesbreakpoint()a universal hook, not just a pdb shortcut.
Python ships with pdb, a full interactive debugger. Since Python 3.7, breakpoint() is the preferred way to enter it.
Entering the Debugger¶
# Modern (Python 3.7+)
def process_data(records):
for record in records:
breakpoint() # Drops into pdb here
transform(record)
# Legacy
import pdb; pdb.set_trace()
# From the command line (debug from the start)
$ python -m pdb script.py
# Post-mortem debugging (after an exception)
$ python -m pdb script.py
# When it crashes, you're dropped into pdb at the crash site
Essential pdb Commands¶
| Command | Short | What It Does |
|---|---|---|
next |
n |
Execute the next line (step over function calls) |
step |
s |
Step into a function call |
continue |
c |
Continue execution until next breakpoint |
list |
l |
Show source code around current line |
longlist |
ll |
Show entire current function |
print expr |
p expr |
Print the value of an expression |
pretty-print |
pp expr |
Pretty-print (useful for dicts, lists) |
where |
w |
Show the call stack (where am I?) |
up |
u |
Move up one frame in the call stack |
down |
d |
Move down one frame in the call stack |
break |
b |
Set a breakpoint (b 42 = line 42, b func = at function) |
clear |
cl |
Clear breakpoints |
return |
r |
Continue until the current function returns |
quit |
q |
Quit the debugger |
!statement |
Execute a Python statement (useful when variable names shadow commands) |
Common pdb Workflow¶
# You're debugging a data processing pipeline
def process_batch(items):
results = []
for item in items:
breakpoint()
# In pdb:
# p item — inspect the current item
# p len(results) — check progress
# p item.keys() — see what fields exist
# !item['status'] = 'fixed' — mutate data on the fly
# c — continue to next iteration
result = transform(item)
results.append(result)
return results
Conditional Breakpoints¶
# Break only when a condition is met
def process_order(order):
breakpoint() # In pdb, type: condition 1 order.total > 10000
# Or set it programmatically:
if order.total > 10000:
breakpoint()
Enhanced Debuggers: ipdb and pdb++¶
ipdb¶
ipdb adds IPython features to pdb: tab completion, syntax highlighting, better tracebacks.
import ipdb; ipdb.set_trace()
# Or set it as the default breakpoint handler
$ PYTHONBREAKPOINT=ipdb.set_trace python script.py
pdb++ (pdbpp)¶
pdbpp is a drop-in replacement that enhances pdb with sticky mode (shows code continuously), syntax highlighting, and tab completion.
Key pdb++ features:
- Sticky mode: sticky command shows a continuously updated code listing
- Smart command parsing: foo prints variable foo instead of requiring p foo
- Better tab completion: completes variable names, attributes, and methods
Remote Debugging with debugpy¶
For debugging code running in Docker containers, remote servers, or as background services, debugpy (Microsoft's Debug Adapter Protocol server) lets you attach VS Code or any DAP client.
# Add to your application startup
import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client() # Optional: pause until debugger connects
# Expose the debug port in Docker
$ docker run -p 5678:5678 myapp
# In VS Code launch.json:
{
"name": "Attach to Remote",
"type": "python",
"request": "attach",
"connect": {"host": "localhost", "port": 5678}
}
For production, start debugpy only when signaled (don't leave it running):
import signal
import debugpy
def enable_debugger(signum, frame):
debugpy.listen(("127.0.0.1", 5678))
print("Debugger listening on port 5678")
signal.signal(signal.SIGUSR1, enable_debugger)
# Send SIGUSR1 to enable: kill -USR1 <pid>
The logging Module¶
For production debugging, logging is your primary tool. It's built-in, configurable, and doesn't require interactive access.
Logging Levels¶
| Level | Value | When to Use |
|---|---|---|
DEBUG |
10 | Detailed diagnostic info (variable values, flow tracing) |
INFO |
20 | Confirmation that things are working as expected |
WARNING |
30 | Something unexpected but not broken (deprecation, retry) |
ERROR |
40 | An operation failed but the app continues |
CRITICAL |
50 | The application cannot continue |
Production Logging Pattern¶
import logging
# Configure once at application startup
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(name)s %(levelname)s %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S%z",
)
logger = logging.getLogger(__name__)
def process_request(request_id, payload):
logger.info("Processing request %s", request_id)
try:
result = do_work(payload)
logger.debug("Result for %s: %r", request_id, result)
return result
except ValidationError as e:
logger.warning("Validation failed for %s: %s", request_id, e)
raise
except Exception:
logger.exception("Unexpected error processing %s", request_id)
# logger.exception() auto-includes the traceback
raise
Structured Logging (JSON)¶
For production systems shipping logs to ELK/Loki/Datadog:
import json
import logging
class JSONFormatter(logging.Formatter):
def format(self, record):
log_entry = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
}
if record.exc_info:
log_entry["exception"] = self.formatException(record.exc_info)
return json.dumps(log_entry)
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logging.root.addHandler(handler)
Or use the python-json-logger package for a ready-made solution.
Tracebacks and Exception Inspection¶
traceback Module¶
import traceback
try:
risky_operation()
except Exception:
# Print traceback without re-raising
traceback.print_exc()
# Capture traceback as a string (for logging, alerting)
tb_str = traceback.format_exc()
logger.error("Operation failed:\n%s", tb_str)
sys.exc_info()¶
import sys
try:
risky_operation()
except Exception:
exc_type, exc_value, exc_tb = sys.exc_info()
# exc_type: <class 'ValueError'>
# exc_value: the exception instance
# exc_tb: traceback object (for programmatic inspection)
Exception Chaining (Python 3)¶
try:
config = load_config()
except FileNotFoundError as e:
raise RuntimeError("Cannot start without config") from e
# The traceback shows both exceptions:
# FileNotFoundError: config.yaml not found
# The above exception was the direct cause of:
# RuntimeError: Cannot start without config
assert and debug¶
# Assertions are removed when Python runs with -O (optimize)
assert isinstance(data, dict), f"Expected dict, got {type(data)}"
# __debug__ is True normally, False with -O
if __debug__:
validate_expensive_invariant(data)
# In production, run with optimization to skip asserts:
$ python -O app.py
# Never use assert for input validation in production code
warnings Module¶
import warnings
# Issue a deprecation warning
def old_api():
warnings.warn("old_api() is deprecated, use new_api()", DeprecationWarning, stacklevel=2)
return new_api()
# Control warning behavior
warnings.filterwarnings("error", category=DeprecationWarning) # Turn warnings into exceptions
warnings.filterwarnings("ignore", message=".*experimental.*") # Silence specific warnings
# From command line
$ python -W error::DeprecationWarning script.py # Warnings become errors
$ python -W ignore script.py # Silence all warnings
faulthandler: Debugging Segfaults and Hangs¶
Debug clue: If a Python process suddenly exits with no traceback, check
dmesgforsegfaultorkilledentries. A missing traceback usually means a C extension crashed (segfault), the OOM killer fired, or the process received a signal like SIGKILL.
When Python crashes with a segfault (common with C extensions), the normal traceback is lost. faulthandler dumps the Python stack trace on crash.
# Enable via environment variable (simplest)
$ PYTHONFAULTHANDLER=1 python app.py
# Or enable in code
import faulthandler
faulthandler.enable()
# Dump traceback on signal (debugging hangs)
import faulthandler
import signal
faulthandler.register(signal.SIGUSR1)
# Now: kill -USR1 <pid> prints stack trace to stderr without stopping the process
Profiling: Finding Where Time Goes¶
cProfile¶
# Profile an entire script
$ python -m cProfile -s cumulative script.py
# Top output columns:
# ncalls — number of calls
# tottime — time in this function (excluding subcalls)
# cumtime — cumulative time (including subcalls)
# percall — per-call time
# Profile a specific function
import cProfile
cProfile.run('process_batch(data)', sort='cumulative')
# Save profile for analysis
$ python -m cProfile -o profile.out script.py
$ python -m pstats profile.out
# In pstats: sort cumulative, stats 20
line_profiler (Line-by-Line)¶
$ pip install line_profiler
# Decorate the function you want to profile
@profile
def slow_function():
data = load_data() # 0.1s
processed = transform(data) # 3.2s <-- bottleneck found
save_results(processed) # 0.3s
$ kernprof -l -v script.py
Memory Profiling with tracemalloc¶
import tracemalloc
tracemalloc.start()
# ... your code runs ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("Top 10 memory allocations:")
for stat in top_stats[:10]:
print(stat)
objgraph: Finding Memory Leaks¶
import objgraph
# What types have the most instances?
objgraph.show_most_common_types(limit=10)
# What's holding a reference to this object? (why isn't it GC'd?)
objgraph.show_backrefs(objgraph.by_type('MyClass')[0], filename='refs.png')
# How many new objects since last check?
objgraph.show_growth(limit=10)
strace on Python Processes¶
When Python itself seems stuck and the debugger can't help, strace shows what system calls the process is making:
# Attach to a running Python process
$ strace -p <pid> -e trace=network,read,write -f
# -f follows child threads
# -e filters to specific syscall categories
# Common findings:
# - Stuck on read() from a socket → waiting for a network response (DNS? API? DB?)
# - Stuck on futex() → waiting on a lock (GIL contention? threading deadlock?)
# - Stuck on poll() with timeout → event loop waiting (normal for idle async)
# - Repeated open()/stat() on missing file → config or import path issue
# Trace a Python script from the start
$ strace -o trace.log -f python script.py
$ grep -c 'open(' trace.log # How many file opens?
The trace Module¶
Built-in tracing of Python execution — shows every line as it runs:
# Trace all executed lines
$ python -m trace --trace script.py
# Count line executions (find hot paths)
$ python -m trace --count script.py
# Creates .cover files showing execution counts per line
# List functions called
$ python -m trace --listfuncs script.py
Debugging in Production (Without Interactive Access)¶
In production, you rarely have interactive debugger access. Your toolkit:
- Logging with appropriate levels and structured output
- Metrics (Prometheus counters for error rates, latencies, queue depths)
- Tracing (OpenTelemetry spans for request flow across services)
- faulthandler for segfault stack traces
- py-spy for sampling profiler that attaches to running processes without restart
- Signal handlers that dump state on SIGUSR1/SIGUSR2
# py-spy: attach to a running Python process (no restart needed)
$ pip install py-spy
# Live top-like view of where time is spent
$ py-spy top --pid <pid>
# Record a flame graph
$ py-spy record -o profile.svg --pid <pid> --duration 30
# Dump current stack traces of all threads
$ py-spy dump --pid <pid>
py-spy reads process memory directly — it works on processes you didn't instrument, including those running in Docker containers (use --pid of the container's PID 1 from the host namespace).
Gotcha: py-spy requires
SYS_PTRACEcapability, which Docker drops by default. Run withdocker run --cap-add SYS_PTRACEor setptrace_scopeon the host:sysctl kernel.yama.ptrace_scope=0. In Kubernetes, add the capability to the pod'ssecurityContext.
Wiki Navigation¶
Prerequisites¶
- Python for Infrastructure (Topic Pack, L1)
Related Content¶
- Perl Flashcards (CLI) (flashcard_deck, L1) — Python Automation
- Python Async & Concurrency (Topic Pack, L2) — Python Automation
- Python Drills (Drill, L0) — Python Automation
- Python Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Python Automation
- Python Flashcards (CLI) (flashcard_deck, L1) — Python Automation
- Python Packaging (Topic Pack, L2) — Python Automation
- Python for Infrastructure (Topic Pack, L1) — Python Automation
- Skillcheck: Python Automation (Assessment, L0) — Python Automation
- Software Development Flashcards (CLI) (flashcard_deck, L1) — Python Automation