Python Async & Concurrency Footguns¶

Mistakes that cause silent bugs, performance regressions, deadlocks, and data corruption in concurrent Python.

1. The GIL misconception: threads don't parallelize CPU work¶

You add 8 threads to speed up a CPU-bound data processing pipeline. It runs slower than single-threaded because all 8 threads fight over the GIL. The OS context-switches between them constantly, adding overhead with zero parallelism.

# BAD: threads for CPU-bound work
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as e:
    results = list(e.map(parse_and_transform, huge_dataset))  # Slower than serial

# GOOD: processes for CPU-bound work
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=8) as e:
    results = list(e.map(parse_and_transform, huge_dataset))  # Actually parallel

Fix: Use ProcessPoolExecutor for CPU work. Threads are only for I/O-bound work where the GIL is released (network, file, sleep).

2. Blocking the event loop in async code¶

You call requests.get() or time.sleep() inside an async def function. The entire event loop freezes. Every other coroutine, every other HTTP request being served, all of it stops until the blocking call returns. In a FastAPI app serving 1000 concurrent users, one slow database query with the sync driver blocks everyone.

# BAD: blocks the entire event loop
async def handler():
    data = requests.get("https://slow-api.example.com")  # 5 second block
    time.sleep(1)  # Another 1 second block

# GOOD: use async equivalents
async def handler():
    async with aiohttp.ClientSession() as session:
        async with session.get("https://slow-api.example.com") as resp:
            data = await resp.json()
    await asyncio.sleep(1)

# OK: offload unavoidable sync code to thread pool
async def handler():
    loop = asyncio.get_event_loop()
    data = await loop.run_in_executor(None, requests.get, "https://slow-api.example.com")

Fix: Every I/O call in async def must use an async library or run_in_executor. Enable PYTHONASYNCIODEBUG=1 to catch blocking calls longer than 100ms.

3. Mixing asyncio and threads incorrectly¶

You create an event loop in the main thread, then try to call asyncio.run() from a worker thread. It crashes because each thread needs its own event loop, and asyncio.run() creates a new one that conflicts. Or you call a coroutine from a thread without asyncio.run_coroutine_threadsafe().

# BAD: calling async code from a thread
def thread_worker():
    result = asyncio.run(some_coroutine())  # Creates a new loop, may conflict

# BAD: directly awaiting from non-async code
def thread_worker():
    result = await some_coroutine()  # SyntaxError outside async def

# GOOD: use run_coroutine_threadsafe
def thread_worker(loop):
    future = asyncio.run_coroutine_threadsafe(some_coroutine(), loop)
    result = future.result(timeout=10)  # Blocks this thread, not the loop

Fix: Use asyncio.run_coroutine_threadsafe(coro, loop) to schedule coroutines from threads. Pass the running loop explicitly.

4. Fork safety: forking after threading¶

On Linux, multiprocessing defaults to fork. If you fork a process after threads have been created, the child inherits copies of all locks in whatever state they were in. If a thread held a lock when the fork happened, the child process has a permanently locked mutex. Deadlock on first use.

# BAD: fork after threads exist
import threading
import multiprocessing

t = threading.Thread(target=background_task)
t.start()
# Later...
p = multiprocessing.Process(target=worker)  # fork() copies broken lock state
p.start()  # May deadlock immediately

# GOOD: set spawn method before any threads
multiprocessing.set_start_method("spawn")  # Or "forkserver"

Fix: Call multiprocessing.set_start_method("spawn") at program start, before creating threads. This is the default on macOS and Windows but not Linux.

5. Zombie processes from multiprocessing¶

You create Process objects but never call join(). The child processes finish but their entries stay in the process table as zombies. In a long-running service, zombie processes accumulate until the OS hits the PID limit.

# BAD: processes never joined
processes = []
for chunk in data_chunks:
    p = multiprocessing.Process(target=worker, args=(chunk,))
    p.start()
    processes.append(p)
# Script exits without join - zombies accumulate

# GOOD: always join or use a Pool/context manager
for p in processes:
    p.join(timeout=60)
    if p.is_alive():
        p.kill()
        p.join()

# BETTER: use Pool or ProcessPoolExecutor (handles lifecycle)
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(worker, data_chunks))

Fix: Always join() processes. Prefer ProcessPoolExecutor or multiprocessing.Pool which manage process lifecycle automatically.

6. Shared mutable state without locks¶

You have multiple threads incrementing a counter, appending to a list, or updating a dictionary. Python's GIL does not make these operations atomic. counter += 1 is actually three bytecode operations: load, add, store. A thread switch between load and store means lost updates.

# BAD: race condition
shared_counter = 0
def worker():
    global shared_counter
    for _ in range(100_000):
        shared_counter += 1  # Not atomic: LOAD, ADD, STORE

# After 4 threads: shared_counter < 400_000 (lost updates)

# GOOD: use a lock
lock = threading.Lock()
def worker():
    global shared_counter
    for _ in range(100_000):
        with lock:
            shared_counter += 1

# BETTER: use thread-safe data structures
from queue import Queue
from collections import Counter
# Or use threading.local() for per-thread state

Fix: Protect every read-modify-write on shared state with a lock. Or redesign to avoid shared mutable state entirely (message passing via queues).

7. Async context manager not used (resource leak)¶

You create an aiohttp.ClientSession() but forget to use async with. The session is never properly closed. Connections pile up. After hundreds of requests, you hit the file descriptor limit and get OSError: [Errno 24] Too many open files.

# BAD: session never closed
async def fetch_many(urls):
    session = aiohttp.ClientSession()  # Never closed!
    for url in urls:
        async with session.get(url) as resp:
            yield await resp.json()
    # ResourceWarning: unclosed connector

# GOOD: use async with
async def fetch_many(urls):
    async with aiohttp.ClientSession() as session:
        for url in urls:
            async with session.get(url) as resp:
                yield await resp.json()

Fix: Always use async with for sessions, connections, file handles, and any resource that has an __aenter__/__aexit__. Enable Python warnings (-W all) to catch ResourceWarning.

8. Deadlocks from lock ordering¶

Thread A acquires lock_x then tries to acquire lock_y. Thread B acquires lock_y then tries to acquire lock_x. Both threads block forever waiting for the other's lock.

# BAD: inconsistent lock ordering
lock_x = threading.Lock()
lock_y = threading.Lock()

def thread_a():
    with lock_x:       # Holds X
        with lock_y:   # Waits for Y -> DEADLOCK
            do_work()

def thread_b():
    with lock_y:       # Holds Y
        with lock_x:   # Waits for X -> DEADLOCK
            do_work()

# GOOD: always acquire locks in the same order
def thread_a():
    with lock_x:
        with lock_y:
            do_work()

def thread_b():
    with lock_x:       # Same order as thread_a
        with lock_y:
            do_work()

# BETTER: use a single coarser lock or lock-free design

Fix: Establish a global lock ordering. Always acquire locks in the same order everywhere. Consider using threading.RLock with timeouts to detect deadlocks: lock.acquire(timeout=5).

9. ThreadPoolExecutor default sizing¶

Before Python 3.13, ThreadPoolExecutor defaults to min(32, os.cpu_count() + 4) workers. On a 64-core machine that is 32 threads. If each thread opens a database connection, you just opened 32 connections per executor instance. Three executors in your app = 96 connections, blowing the database connection limit.

# BAD: default sizing hits resource limits
with ThreadPoolExecutor() as executor:  # 32 threads on a 64-core box
    results = list(executor.map(db_query, queries))
    # 32 simultaneous DB connections

# GOOD: size based on the resource you're consuming
with ThreadPoolExecutor(max_workers=5) as executor:  # Match your DB pool size
    results = list(executor.map(db_query, queries))

Fix: Always set max_workers explicitly based on the bottleneck resource (DB pool size, API rate limit, available file descriptors), not CPU count.

10. ProcessPoolExecutor pickle requirement¶

ProcessPoolExecutor serializes function arguments and return values using pickle to send them between processes. Lambdas, inner functions, open file handles, database connections, and many complex objects cannot be pickled. The error message is often unhelpful: Can't pickle <class>.

# BAD: lambda can't be pickled
with ProcessPoolExecutor() as executor:
    results = executor.map(lambda x: x**2, range(100))
    # TypeError: can't pickle <lambda>

# BAD: passing unpicklable objects
with ProcessPoolExecutor() as executor:
    executor.submit(process, db_connection)  # Can't pickle connection

# GOOD: use module-level functions and pass serializable data
def square(x):
    return x ** 2

with ProcessPoolExecutor() as executor:
    results = list(executor.map(square, range(100)))

# GOOD: create connections inside the worker
def db_worker(query):
    conn = create_connection()  # Created in worker process
    try:
        return conn.execute(query)
    finally:
        conn.close()

Fix: Use top-level named functions (not lambdas or closures). Pass only serializable data (strings, numbers, dicts, lists). Create non-serializable resources (connections, file handles) inside the worker process.