Skip to content

Portal | Level: L2: Operations | Topics: Python Packaging, Python Automation | Domain: DevOps & Tooling

Python Packaging - Primer

Why This Matters

Every Python project, from a one-file deployment script to a full microservice, has dependencies. How you manage those dependencies determines whether your builds are reproducible, whether your Docker images are lean, whether your deploys are safe, and whether "it works on my machine" is a daily headache or a solved problem. The Python packaging ecosystem is notoriously complex — understanding it saves hours of debugging mysterious import errors and broken environments.

pip and PyPI

pip is Python's package installer. PyPI (Python Package Index) is the public repository it pulls from by default.

# Install a package
$ pip install requests

# Install a specific version
$ pip install requests==2.31.0

# Install with compatible release (>=2.31.0, <2.32.0)
$ pip install "requests~=2.31.0"

# Install from a requirements file
$ pip install -r requirements.txt

# Show what's installed
$ pip list
$ pip show requests    # Details about one package

# Check for outdated packages
$ pip list --outdated

How pip Resolves Dependencies

When you pip install flask, pip must resolve Flask's dependencies (Werkzeug, Jinja2, etc.) and their dependencies, recursively. If two packages require incompatible versions of the same dependency, pip will either: - Backtrack and try different versions (pip 20.3+ has a proper resolver) - Fail with a resolution error

# See what would be installed (dry run)
$ pip install --dry-run flask

# Force reinstall (useful when things are broken)
$ pip install --force-reinstall flask

# Install without dependencies (dangerous, for debugging only)
$ pip install --no-deps mypackage

Virtual Environments

A virtual environment is an isolated Python installation with its own site-packages. Packages installed in one venv don't affect another.

venv (Built-in, Python 3.3+)

# Create a virtual environment
$ python3 -m venv .venv

# Activate it
$ source .venv/bin/activate        # bash/zsh
$ source .venv/bin/activate.fish   # fish
$ .venv\Scripts\activate           # Windows cmd

# Your prompt changes:
(.venv) $ which python
/home/user/project/.venv/bin/python

# Install packages (goes into .venv/lib/python3.11/site-packages/)
(.venv) $ pip install requests

# Deactivate
(.venv) $ deactivate

virtualenv (Third-Party, Faster)

$ pip install virtualenv
$ virtualenv .venv    # Faster than venv, supports more Python versions

Key Virtual Environment Facts

  • A venv is just a directory structure. Delete it and you're clean.
  • The python binary in the venv is a symlink or copy of the system Python.
  • pip install inside an active venv installs into the venv only.
  • .venv/ should be in .gitignore. Never commit it.

Dependency Specification Files

requirements.txt

The simplest format. A list of packages, optionally with version constraints.

# Direct dependencies (loose)
requests>=2.28
flask>=3.0
psycopg2-binary>=2.9

# Pinned (reproducible)
requests==2.31.0
flask==3.0.2
psycopg2-binary==2.9.9
Werkzeug==3.0.1
Jinja2==3.1.3
# ... all transitive dependencies pinned too

pip freeze

# Dump all installed packages with exact versions
$ pip freeze > requirements.txt

# Problem: includes EVERYTHING in the venv, including dev tools
$ pip freeze
ipdb==0.13.13     # You don't want this in production
pytest==8.0.0     # Or this
requests==2.31.0  # This is what you actually need

pip-compile (pip-tools) — The Better Way

pip-tools separates what you want (direct dependencies) from what you get (full resolution).

$ pip install pip-tools

# requirements.in — your direct dependencies
$ cat requirements.in
requests>=2.28
flask>=3.0
gunicorn>=21.0

# Compile to pinned requirements.txt
$ pip-compile requirements.in
# Output: requirements.txt with all transitive deps pinned, with comments showing why

# Compile with hashes for supply chain security
$ pip-compile --generate-hashes requirements.in

# Sync your environment to exactly match requirements.txt
$ pip-sync requirements.txt
# Removes packages not in requirements.txt — keeps environment clean

# Upgrade a specific package
$ pip-compile --upgrade-package requests requirements.in

The generated requirements.txt includes comments showing dependency chains:

# requirements.txt generated by pip-compile
flask==3.0.2
    # via -r requirements.in
jinja2==3.1.3
    # via flask
markupsafe==2.1.5
    # via jinja2
werkzeug==3.0.1
    # via flask

pyproject.toml: The Modern Standard

PEP 621 defines pyproject.toml as the standard way to declare project metadata and dependencies. It replaces setup.py, setup.cfg, and MANIFEST.in.

[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"

[project]
name = "mypackage"
version = "1.2.0"
description = "A useful package"
requires-python = ">=3.9"
dependencies = [
    "requests>=2.28,<3.0",
    "pydantic>=2.0",
    "click>=8.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0",
    "ruff>=0.3.0",
    "ipdb>=0.13",
]

[project.scripts]
mycommand = "mypackage.cli:main"

[tool.setuptools.packages.find]
where = ["src"]    # if using src layout

setup.py vs pyproject.toml

Feature setup.py pyproject.toml
Format Python code (executable) TOML (declarative)
Standard Legacy PEP 621 (current)
Build backends setuptools only Any (setuptools, flit, hatch, pdm)
Tool config Separate files Unified (ruff, pytest, mypy all in one file)
Security Arbitrary code execution Declarative, no code execution

Rule: For new projects, use pyproject.toml. Only keep setup.py for backward compatibility with old tools.

Timeline: Python packaging has gone through four eras: distutils (2000, stdlib), setuptools + setup.py (2004, arbitrary code execution on install), setup.cfg (2016, declarative but still setuptools), and pyproject.toml (PEP 518 in 2017, PEP 621 in 2021). Each era tried to fix the previous one's security and reproducibility problems.

Build Backends: Poetry, PDM, Hatch

Poetry

$ pip install poetry

# Create a new project
$ poetry new myproject

# Add a dependency
$ poetry add requests
$ poetry add --group dev pytest

# Install dependencies (creates virtualenv automatically)
$ poetry install

# Lock dependencies (poetry.lock — commit this)
$ poetry lock

# Build a wheel
$ poetry build

Poetry uses pyproject.toml with a [tool.poetry] section (not PEP 621 compliant until Poetry 2.0).

PDM

$ pip install pdm

# PDM uses PEP 621 natively
$ pdm init
$ pdm add requests
$ pdm install
$ pdm build

Hatch

$ pip install hatch

$ hatch new myproject
$ hatch env create
$ hatch run test    # Run commands in managed environments
$ hatch build

Which to Choose?

  • pip-tools: Simplest. Good for applications (not libraries). No lock file format wars.
  • Poetry: Most popular. Good ecosystem. Opinionated. Lock file is Poetry-specific.
  • PDM: PEP-compliant. Good if you want standards-based tooling.
  • Hatch: Official PyPA-endorsed. Good for libraries and multi-environment projects.

For DevOps (deploying applications, not publishing libraries): pip-tools is often the right choice. Simple, transparent, works with plain pip.

Remember: Mnemonic for the pip-tools workflow: "In-file declares, txt-file pins, sync installs." requirements.in = what you want. requirements.txt = what you get (generated by pip-compile). pip-sync = make the environment match exactly. This three-step pattern gives you reproducible builds without a heavy tool like Poetry.

Wheel vs Sdist

Python packages are distributed in two formats:

Wheel (.whl)

Pre-built binary format. Installs fast (no compilation). Platform-specific for packages with C extensions.

$ pip wheel . -w dist/
# Creates: dist/mypackage-1.0.0-py3-none-any.whl
# py3 = Python 3, none = no ABI dependency, any = any platform

# For packages with C extensions:
# mypackage-1.0.0-cp311-cp311-manylinux_2_17_x86_64.whl
# cp311 = CPython 3.11, manylinux = Linux with glibc 2.17+, x86_64

Sdist (Source Distribution)

Source code archive. Requires build tools to install. Platform-independent.

$ python -m build --sdist
# Creates: dist/mypackage-1.0.0.tar.gz

Always prefer wheels for deployment. They install faster and don't require a compiler on the target machine. Build wheels in CI, push to your private PyPI.

Editable Installs

During development, install your package in "editable" mode so code changes take effect immediately without reinstalling:

# Install your project in editable mode
$ pip install -e .
# Or with extras:
$ pip install -e ".[dev]"

# What this does: creates a .pth file or egg-link that points to your source
# Changes to source files are immediately reflected without reinstalling

Version Pinning Strategies

Strategy Syntax When to Use
Exact pin requests==2.31.0 Production deploys, Docker images
Compatible release requests~=2.31.0 Libraries (allows 2.31.x but not 2.32)
Minimum requests>=2.28 Direct dependency specs (requirements.in)
Range requests>=2.28,<3.0 When you know major version breaks compat
No pin requests Never in production

Best practice for applications: pin exact versions in requirements.txt (via pip-compile). Keep loose specs in requirements.in. Run pip-compile --upgrade periodically to update.

Best practice for libraries: use minimum version or compatible release in pyproject.toml. Don't over-constrain — let the application resolve versions.

Private PyPI (Internal Packages)

devpi

# Install and run devpi server
$ pip install devpi-server devpi-client
$ devpi-server --init
$ devpi-server --start --port 3141

# Upload packages
$ devpi use http://localhost:3141/root/pypi
$ devpi upload dist/mypackage-1.0.0-py3-none-any.whl

# Install from your private index
$ pip install mypackage --index-url http://localhost:3141/root/pypi/+simple/

AWS CodeArtifact

# Get auth token
$ aws codeartifact get-authorization-token --domain myorg --query authorizationToken --output text

# Configure pip
$ pip install mypackage --index-url https://aws:TOKEN@myorg-123456789.d.codeartifact.us-east-1.amazonaws.com/pypi/internal/simple/

pip.conf for Default Index

# ~/.pip/pip.conf (or in a virtualenv: .venv/pip.conf)
[global]
index-url = https://pypi.org/simple/
extra-index-url = https://your-private-pypi.example.com/simple/
trusted-host = your-private-pypi.example.com

Docker and Python Packaging

Multi-Stage Build Pattern

# Stage 1: Build dependencies (includes build tools)
FROM python:3.11-slim AS builder

RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc libpq-dev && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Runtime (no build tools, smaller image)
FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq5 && \
    rm -rf /var/lib/apt/lists/*

COPY --from=builder /install /usr/local
COPY . /app
WORKDIR /app

USER 1000
CMD ["gunicorn", "app:create_app()", "-b", "0.0.0.0:8000"]

Key Docker + Python Rules

# Always use --no-cache-dir to reduce image size
RUN pip install --no-cache-dir -r requirements.txt

# Copy requirements.txt before source code for better layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

# Pin the base image digest for reproducibility
FROM python:3.11-slim@sha256:abc123...

# Don't use virtualenvs in Docker (you already have isolation)
# But if you must (for multi-stage builds), set:
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

System Packages vs pip Packages

On Linux, Python packages come from two sources:

Source Location Manager
System packages /usr/lib/python3/dist-packages/ apt, dnf, yum
pip packages /usr/local/lib/python3.11/dist-packages/ (global) pip
venv packages .venv/lib/python3.11/site-packages/ pip

Never mix system and pip packages. System packages exist for system tools (apt, cloud-init, etc.). Application dependencies go in virtualenvs.

Fun fact: PEP 668 (the "externally managed environment" error introduced in Python 3.11+) was specifically motivated by Ubuntu and Debian maintainers who were tired of users breaking system Python with pip install --user. The error message is intentionally annoying — it forces you toward venvs, which is the correct practice.

# PEP 668 (Python 3.11+): pip refuses to install system-wide
$ pip install requests
# error: externally-managed-environment
# This is CORRECT behavior. Use a venv.

$ python3 -m venv .venv && source .venv/bin/activate
$ pip install requests    # Works fine in the venv

site-packages Layout

Understanding where Python looks for packages:

import sys

# Module search path
print(sys.path)
# ['', '/usr/lib/python311.zip', '/usr/lib/python3.11',
#  '/usr/lib/python3.11/lib-dynload',
#  '/home/user/.venv/lib/python3.11/site-packages']

# Where a specific package is installed
import requests
print(requests.__file__)
# /home/user/.venv/lib/python3.11/site-packages/requests/__init__.py

# All site-packages directories
import site
print(site.getsitepackages())

PYTHONPATH

PYTHONPATH prepends directories to sys.path, affecting where Python searches for imports.

# Add a directory to the import path
$ PYTHONPATH=/opt/mylibs:$PYTHONPATH python script.py

# Common use: make a project importable during development
$ PYTHONPATH=. python -m mypackage.cli

# Warning: PYTHONPATH is global. Setting it system-wide affects ALL Python programs.
# Prefer `pip install -e .` over PYTHONPATH hacks.

Dependency Auditing

# pip-audit: check for known vulnerabilities
$ pip install pip-audit
$ pip-audit
# Found 2 known vulnerabilities in 1 package
# Name    Version  ID           Fix Versions
# pillow  9.0.0    PYSEC-2023-  >=9.3.0

# safety: alternative vulnerability scanner
$ pip install safety
$ safety check

# pip-licenses: check license compliance
$ pip install pip-licenses
$ pip-licenses --format=table

Wiki Navigation

Prerequisites

  • Perl Flashcards (CLI) (flashcard_deck, L1) — Python Automation
  • Python Async & Concurrency (Topic Pack, L2) — Python Automation
  • Python Debugging (Topic Pack, L1) — Python Automation
  • Python Drills (Drill, L0) — Python Automation
  • Python Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Python Automation
  • Python Flashcards (CLI) (flashcard_deck, L1) — Python Automation
  • Python for Infrastructure (Topic Pack, L1) — Python Automation
  • Skillcheck: Python Automation (Assessment, L0) — Python Automation
  • Software Development Flashcards (CLI) (flashcard_deck, L1) — Python Automation