Skip to content

Portal | Level: L1: Foundations | Topics: Python Debugging, Python Automation | Domain: DevOps & Tooling

Python Debugging - Primer

Why This Matters

Every DevOps engineer writes Python: deployment scripts, monitoring integrations, API clients, configuration generators. When those scripts fail in production at 3 AM, you need debugging skills that go beyond adding print() statements. Understanding Python's debugging tools — from pdb to profilers to production tracing — is the difference between a 10-minute fix and a 4-hour guessing game.

pdb: The Built-in Debugger

Under the hood: The breakpoint() builtin (PEP 553, Python 3.7) calls sys.breakpointhook(), which defaults to pdb.set_trace(). You can swap the debugger entirely via the PYTHONBREAKPOINT environment variable — set it to ipdb.set_trace, pudb.set_trace, or 0 to disable all breakpoints. This makes breakpoint() a universal hook, not just a pdb shortcut.

Python ships with pdb, a full interactive debugger. Since Python 3.7, breakpoint() is the preferred way to enter it.

Entering the Debugger

# Modern (Python 3.7+)
def process_data(records):
    for record in records:
        breakpoint()  # Drops into pdb here
        transform(record)

# Legacy
import pdb; pdb.set_trace()

# From the command line (debug from the start)
$ python -m pdb script.py

# Post-mortem debugging (after an exception)
$ python -m pdb script.py
# When it crashes, you're dropped into pdb at the crash site

Essential pdb Commands

Command Short What It Does
next n Execute the next line (step over function calls)
step s Step into a function call
continue c Continue execution until next breakpoint
list l Show source code around current line
longlist ll Show entire current function
print expr p expr Print the value of an expression
pretty-print pp expr Pretty-print (useful for dicts, lists)
where w Show the call stack (where am I?)
up u Move up one frame in the call stack
down d Move down one frame in the call stack
break b Set a breakpoint (b 42 = line 42, b func = at function)
clear cl Clear breakpoints
return r Continue until the current function returns
quit q Quit the debugger
!statement Execute a Python statement (useful when variable names shadow commands)

Common pdb Workflow

# You're debugging a data processing pipeline
def process_batch(items):
    results = []
    for item in items:
        breakpoint()
        # In pdb:
        # p item           — inspect the current item
        # p len(results)   — check progress
        # p item.keys()    — see what fields exist
        # !item['status'] = 'fixed'  — mutate data on the fly
        # c                — continue to next iteration
        result = transform(item)
        results.append(result)
    return results

Conditional Breakpoints

# Break only when a condition is met
def process_order(order):
    breakpoint()  # In pdb, type: condition 1 order.total > 10000
    # Or set it programmatically:
    if order.total > 10000:
        breakpoint()

Enhanced Debuggers: ipdb and pdb++

ipdb

ipdb adds IPython features to pdb: tab completion, syntax highlighting, better tracebacks.

$ pip install ipdb
import ipdb; ipdb.set_trace()

# Or set it as the default breakpoint handler
$ PYTHONBREAKPOINT=ipdb.set_trace python script.py

pdb++ (pdbpp)

pdbpp is a drop-in replacement that enhances pdb with sticky mode (shows code continuously), syntax highlighting, and tab completion.

$ pip install pdbpp
# Now 'breakpoint()' automatically uses pdb++ instead of pdb

Key pdb++ features: - Sticky mode: sticky command shows a continuously updated code listing - Smart command parsing: foo prints variable foo instead of requiring p foo - Better tab completion: completes variable names, attributes, and methods

Remote Debugging with debugpy

For debugging code running in Docker containers, remote servers, or as background services, debugpy (Microsoft's Debug Adapter Protocol server) lets you attach VS Code or any DAP client.

# Add to your application startup
import debugpy
debugpy.listen(("0.0.0.0", 5678))
print("Waiting for debugger attach...")
debugpy.wait_for_client()  # Optional: pause until debugger connects
# Expose the debug port in Docker
$ docker run -p 5678:5678 myapp

# In VS Code launch.json:
{
    "name": "Attach to Remote",
    "type": "python",
    "request": "attach",
    "connect": {"host": "localhost", "port": 5678}
}

For production, start debugpy only when signaled (don't leave it running):

import signal
import debugpy

def enable_debugger(signum, frame):
    debugpy.listen(("127.0.0.1", 5678))
    print("Debugger listening on port 5678")

signal.signal(signal.SIGUSR1, enable_debugger)
# Send SIGUSR1 to enable: kill -USR1 <pid>

The logging Module

For production debugging, logging is your primary tool. It's built-in, configurable, and doesn't require interactive access.

Logging Levels

Level Value When to Use
DEBUG 10 Detailed diagnostic info (variable values, flow tracing)
INFO 20 Confirmation that things are working as expected
WARNING 30 Something unexpected but not broken (deprecation, retry)
ERROR 40 An operation failed but the app continues
CRITICAL 50 The application cannot continue

Production Logging Pattern

import logging

# Configure once at application startup
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(name)s %(levelname)s %(message)s",
    datefmt="%Y-%m-%dT%H:%M:%S%z",
)

logger = logging.getLogger(__name__)

def process_request(request_id, payload):
    logger.info("Processing request %s", request_id)
    try:
        result = do_work(payload)
        logger.debug("Result for %s: %r", request_id, result)
        return result
    except ValidationError as e:
        logger.warning("Validation failed for %s: %s", request_id, e)
        raise
    except Exception:
        logger.exception("Unexpected error processing %s", request_id)
        # logger.exception() auto-includes the traceback
        raise

Structured Logging (JSON)

For production systems shipping logs to ELK/Loki/Datadog:

import json
import logging

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        if record.exc_info:
            log_entry["exception"] = self.formatException(record.exc_info)
        return json.dumps(log_entry)

handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logging.root.addHandler(handler)

Or use the python-json-logger package for a ready-made solution.

Tracebacks and Exception Inspection

traceback Module

import traceback

try:
    risky_operation()
except Exception:
    # Print traceback without re-raising
    traceback.print_exc()

    # Capture traceback as a string (for logging, alerting)
    tb_str = traceback.format_exc()
    logger.error("Operation failed:\n%s", tb_str)

sys.exc_info()

import sys

try:
    risky_operation()
except Exception:
    exc_type, exc_value, exc_tb = sys.exc_info()
    # exc_type: <class 'ValueError'>
    # exc_value: the exception instance
    # exc_tb: traceback object (for programmatic inspection)

Exception Chaining (Python 3)

try:
    config = load_config()
except FileNotFoundError as e:
    raise RuntimeError("Cannot start without config") from e
    # The traceback shows both exceptions:
    # FileNotFoundError: config.yaml not found
    # The above exception was the direct cause of:
    # RuntimeError: Cannot start without config

assert and debug

# Assertions are removed when Python runs with -O (optimize)
assert isinstance(data, dict), f"Expected dict, got {type(data)}"

# __debug__ is True normally, False with -O
if __debug__:
    validate_expensive_invariant(data)

# In production, run with optimization to skip asserts:
$ python -O app.py
# Never use assert for input validation in production code

warnings Module

import warnings

# Issue a deprecation warning
def old_api():
    warnings.warn("old_api() is deprecated, use new_api()", DeprecationWarning, stacklevel=2)
    return new_api()

# Control warning behavior
warnings.filterwarnings("error", category=DeprecationWarning)  # Turn warnings into exceptions
warnings.filterwarnings("ignore", message=".*experimental.*")   # Silence specific warnings

# From command line
$ python -W error::DeprecationWarning script.py    # Warnings become errors
$ python -W ignore script.py                        # Silence all warnings

faulthandler: Debugging Segfaults and Hangs

Debug clue: If a Python process suddenly exits with no traceback, check dmesg for segfault or killed entries. A missing traceback usually means a C extension crashed (segfault), the OOM killer fired, or the process received a signal like SIGKILL.

When Python crashes with a segfault (common with C extensions), the normal traceback is lost. faulthandler dumps the Python stack trace on crash.

# Enable via environment variable (simplest)
$ PYTHONFAULTHANDLER=1 python app.py

# Or enable in code
import faulthandler
faulthandler.enable()

# Dump traceback on signal (debugging hangs)
import faulthandler
import signal
faulthandler.register(signal.SIGUSR1)
# Now: kill -USR1 <pid> prints stack trace to stderr without stopping the process

Profiling: Finding Where Time Goes

cProfile

# Profile an entire script
$ python -m cProfile -s cumulative script.py

# Top output columns:
# ncalls    — number of calls
# tottime   — time in this function (excluding subcalls)
# cumtime   — cumulative time (including subcalls)
# percall   — per-call time

# Profile a specific function
import cProfile

cProfile.run('process_batch(data)', sort='cumulative')

# Save profile for analysis
$ python -m cProfile -o profile.out script.py
$ python -m pstats profile.out
# In pstats: sort cumulative, stats 20

line_profiler (Line-by-Line)

$ pip install line_profiler

# Decorate the function you want to profile
@profile
def slow_function():
    data = load_data()        # 0.1s
    processed = transform(data)  # 3.2s  <-- bottleneck found
    save_results(processed)   # 0.3s

$ kernprof -l -v script.py

Memory Profiling with tracemalloc

import tracemalloc

tracemalloc.start()

# ... your code runs ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 10 memory allocations:")
for stat in top_stats[:10]:
    print(stat)

objgraph: Finding Memory Leaks

$ pip install objgraph
import objgraph

# What types have the most instances?
objgraph.show_most_common_types(limit=10)

# What's holding a reference to this object? (why isn't it GC'd?)
objgraph.show_backrefs(objgraph.by_type('MyClass')[0], filename='refs.png')

# How many new objects since last check?
objgraph.show_growth(limit=10)

strace on Python Processes

When Python itself seems stuck and the debugger can't help, strace shows what system calls the process is making:

# Attach to a running Python process
$ strace -p <pid> -e trace=network,read,write -f
# -f follows child threads
# -e filters to specific syscall categories

# Common findings:
# - Stuck on read() from a socket → waiting for a network response (DNS? API? DB?)
# - Stuck on futex() → waiting on a lock (GIL contention? threading deadlock?)
# - Stuck on poll() with timeout → event loop waiting (normal for idle async)
# - Repeated open()/stat() on missing file → config or import path issue

# Trace a Python script from the start
$ strace -o trace.log -f python script.py
$ grep -c 'open(' trace.log    # How many file opens?

The trace Module

Built-in tracing of Python execution — shows every line as it runs:

# Trace all executed lines
$ python -m trace --trace script.py

# Count line executions (find hot paths)
$ python -m trace --count script.py
# Creates .cover files showing execution counts per line

# List functions called
$ python -m trace --listfuncs script.py

Debugging in Production (Without Interactive Access)

In production, you rarely have interactive debugger access. Your toolkit:

  1. Logging with appropriate levels and structured output
  2. Metrics (Prometheus counters for error rates, latencies, queue depths)
  3. Tracing (OpenTelemetry spans for request flow across services)
  4. faulthandler for segfault stack traces
  5. py-spy for sampling profiler that attaches to running processes without restart
  6. Signal handlers that dump state on SIGUSR1/SIGUSR2
# py-spy: attach to a running Python process (no restart needed)
$ pip install py-spy

# Live top-like view of where time is spent
$ py-spy top --pid <pid>

# Record a flame graph
$ py-spy record -o profile.svg --pid <pid> --duration 30

# Dump current stack traces of all threads
$ py-spy dump --pid <pid>

py-spy reads process memory directly — it works on processes you didn't instrument, including those running in Docker containers (use --pid of the container's PID 1 from the host namespace).

Gotcha: py-spy requires SYS_PTRACE capability, which Docker drops by default. Run with docker run --cap-add SYS_PTRACE or set ptrace_scope on the host: sysctl kernel.yama.ptrace_scope=0. In Kubernetes, add the capability to the pod's securityContext.


Wiki Navigation

Prerequisites

  • Perl Flashcards (CLI) (flashcard_deck, L1) — Python Automation
  • Python Async & Concurrency (Topic Pack, L2) — Python Automation
  • Python Drills (Drill, L0) — Python Automation
  • Python Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Python Automation
  • Python Flashcards (CLI) (flashcard_deck, L1) — Python Automation
  • Python Packaging (Topic Pack, L2) — Python Automation
  • Python for Infrastructure (Topic Pack, L1) — Python Automation
  • Skillcheck: Python Automation (Assessment, L0) — Python Automation
  • Software Development Flashcards (CLI) (flashcard_deck, L1) — Python Automation