Portal | Level: L2: Operations | Topics: Python Packaging, Python Automation | Domain: DevOps & Tooling
Python Packaging - Primer¶
Why This Matters¶
Every Python project, from a one-file deployment script to a full microservice, has dependencies. How you manage those dependencies determines whether your builds are reproducible, whether your Docker images are lean, whether your deploys are safe, and whether "it works on my machine" is a daily headache or a solved problem. The Python packaging ecosystem is notoriously complex — understanding it saves hours of debugging mysterious import errors and broken environments.
pip and PyPI¶
pip is Python's package installer. PyPI (Python Package Index) is the public repository it pulls from by default.
# Install a package
$ pip install requests
# Install a specific version
$ pip install requests==2.31.0
# Install with compatible release (>=2.31.0, <2.32.0)
$ pip install "requests~=2.31.0"
# Install from a requirements file
$ pip install -r requirements.txt
# Show what's installed
$ pip list
$ pip show requests # Details about one package
# Check for outdated packages
$ pip list --outdated
How pip Resolves Dependencies¶
When you pip install flask, pip must resolve Flask's dependencies (Werkzeug, Jinja2, etc.) and their dependencies, recursively. If two packages require incompatible versions of the same dependency, pip will either:
- Backtrack and try different versions (pip 20.3+ has a proper resolver)
- Fail with a resolution error
# See what would be installed (dry run)
$ pip install --dry-run flask
# Force reinstall (useful when things are broken)
$ pip install --force-reinstall flask
# Install without dependencies (dangerous, for debugging only)
$ pip install --no-deps mypackage
Virtual Environments¶
A virtual environment is an isolated Python installation with its own site-packages. Packages installed in one venv don't affect another.
venv (Built-in, Python 3.3+)¶
# Create a virtual environment
$ python3 -m venv .venv
# Activate it
$ source .venv/bin/activate # bash/zsh
$ source .venv/bin/activate.fish # fish
$ .venv\Scripts\activate # Windows cmd
# Your prompt changes:
(.venv) $ which python
/home/user/project/.venv/bin/python
# Install packages (goes into .venv/lib/python3.11/site-packages/)
(.venv) $ pip install requests
# Deactivate
(.venv) $ deactivate
virtualenv (Third-Party, Faster)¶
Key Virtual Environment Facts¶
- A venv is just a directory structure. Delete it and you're clean.
- The
pythonbinary in the venv is a symlink or copy of the system Python. pip installinside an active venv installs into the venv only..venv/should be in.gitignore. Never commit it.
Dependency Specification Files¶
requirements.txt¶
The simplest format. A list of packages, optionally with version constraints.
# Direct dependencies (loose)
requests>=2.28
flask>=3.0
psycopg2-binary>=2.9
# Pinned (reproducible)
requests==2.31.0
flask==3.0.2
psycopg2-binary==2.9.9
Werkzeug==3.0.1
Jinja2==3.1.3
# ... all transitive dependencies pinned too
pip freeze¶
# Dump all installed packages with exact versions
$ pip freeze > requirements.txt
# Problem: includes EVERYTHING in the venv, including dev tools
$ pip freeze
ipdb==0.13.13 # You don't want this in production
pytest==8.0.0 # Or this
requests==2.31.0 # This is what you actually need
pip-compile (pip-tools) — The Better Way¶
pip-tools separates what you want (direct dependencies) from what you get (full resolution).
$ pip install pip-tools
# requirements.in — your direct dependencies
$ cat requirements.in
requests>=2.28
flask>=3.0
gunicorn>=21.0
# Compile to pinned requirements.txt
$ pip-compile requirements.in
# Output: requirements.txt with all transitive deps pinned, with comments showing why
# Compile with hashes for supply chain security
$ pip-compile --generate-hashes requirements.in
# Sync your environment to exactly match requirements.txt
$ pip-sync requirements.txt
# Removes packages not in requirements.txt — keeps environment clean
# Upgrade a specific package
$ pip-compile --upgrade-package requests requirements.in
The generated requirements.txt includes comments showing dependency chains:
# requirements.txt generated by pip-compile
flask==3.0.2
# via -r requirements.in
jinja2==3.1.3
# via flask
markupsafe==2.1.5
# via jinja2
werkzeug==3.0.1
# via flask
pyproject.toml: The Modern Standard¶
PEP 621 defines pyproject.toml as the standard way to declare project metadata and dependencies. It replaces setup.py, setup.cfg, and MANIFEST.in.
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"
[project]
name = "mypackage"
version = "1.2.0"
description = "A useful package"
requires-python = ">=3.9"
dependencies = [
"requests>=2.28,<3.0",
"pydantic>=2.0",
"click>=8.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0",
"ruff>=0.3.0",
"ipdb>=0.13",
]
[project.scripts]
mycommand = "mypackage.cli:main"
[tool.setuptools.packages.find]
where = ["src"] # if using src layout
setup.py vs pyproject.toml¶
| Feature | setup.py | pyproject.toml |
|---|---|---|
| Format | Python code (executable) | TOML (declarative) |
| Standard | Legacy | PEP 621 (current) |
| Build backends | setuptools only | Any (setuptools, flit, hatch, pdm) |
| Tool config | Separate files | Unified (ruff, pytest, mypy all in one file) |
| Security | Arbitrary code execution | Declarative, no code execution |
Rule: For new projects, use pyproject.toml. Only keep setup.py for backward compatibility with old tools.
Timeline: Python packaging has gone through four eras:
distutils(2000, stdlib),setuptools+setup.py(2004, arbitrary code execution on install),setup.cfg(2016, declarative but still setuptools), andpyproject.toml(PEP 518 in 2017, PEP 621 in 2021). Each era tried to fix the previous one's security and reproducibility problems.
Build Backends: Poetry, PDM, Hatch¶
Poetry¶
$ pip install poetry
# Create a new project
$ poetry new myproject
# Add a dependency
$ poetry add requests
$ poetry add --group dev pytest
# Install dependencies (creates virtualenv automatically)
$ poetry install
# Lock dependencies (poetry.lock — commit this)
$ poetry lock
# Build a wheel
$ poetry build
Poetry uses pyproject.toml with a [tool.poetry] section (not PEP 621 compliant until Poetry 2.0).
PDM¶
$ pip install pdm
# PDM uses PEP 621 natively
$ pdm init
$ pdm add requests
$ pdm install
$ pdm build
Hatch¶
$ pip install hatch
$ hatch new myproject
$ hatch env create
$ hatch run test # Run commands in managed environments
$ hatch build
Which to Choose?¶
- pip-tools: Simplest. Good for applications (not libraries). No lock file format wars.
- Poetry: Most popular. Good ecosystem. Opinionated. Lock file is Poetry-specific.
- PDM: PEP-compliant. Good if you want standards-based tooling.
- Hatch: Official PyPA-endorsed. Good for libraries and multi-environment projects.
For DevOps (deploying applications, not publishing libraries): pip-tools is often the right choice. Simple, transparent, works with plain pip.
Remember: Mnemonic for the pip-tools workflow: "In-file declares, txt-file pins, sync installs."
requirements.in= what you want.requirements.txt= what you get (generated bypip-compile).pip-sync= make the environment match exactly. This three-step pattern gives you reproducible builds without a heavy tool like Poetry.
Wheel vs Sdist¶
Python packages are distributed in two formats:
Wheel (.whl)¶
Pre-built binary format. Installs fast (no compilation). Platform-specific for packages with C extensions.
$ pip wheel . -w dist/
# Creates: dist/mypackage-1.0.0-py3-none-any.whl
# py3 = Python 3, none = no ABI dependency, any = any platform
# For packages with C extensions:
# mypackage-1.0.0-cp311-cp311-manylinux_2_17_x86_64.whl
# cp311 = CPython 3.11, manylinux = Linux with glibc 2.17+, x86_64
Sdist (Source Distribution)¶
Source code archive. Requires build tools to install. Platform-independent.
Always prefer wheels for deployment. They install faster and don't require a compiler on the target machine. Build wheels in CI, push to your private PyPI.
Editable Installs¶
During development, install your package in "editable" mode so code changes take effect immediately without reinstalling:
# Install your project in editable mode
$ pip install -e .
# Or with extras:
$ pip install -e ".[dev]"
# What this does: creates a .pth file or egg-link that points to your source
# Changes to source files are immediately reflected without reinstalling
Version Pinning Strategies¶
| Strategy | Syntax | When to Use |
|---|---|---|
| Exact pin | requests==2.31.0 |
Production deploys, Docker images |
| Compatible release | requests~=2.31.0 |
Libraries (allows 2.31.x but not 2.32) |
| Minimum | requests>=2.28 |
Direct dependency specs (requirements.in) |
| Range | requests>=2.28,<3.0 |
When you know major version breaks compat |
| No pin | requests |
Never in production |
Best practice for applications: pin exact versions in requirements.txt (via pip-compile). Keep loose specs in requirements.in. Run pip-compile --upgrade periodically to update.
Best practice for libraries: use minimum version or compatible release in pyproject.toml. Don't over-constrain — let the application resolve versions.
Private PyPI (Internal Packages)¶
devpi¶
# Install and run devpi server
$ pip install devpi-server devpi-client
$ devpi-server --init
$ devpi-server --start --port 3141
# Upload packages
$ devpi use http://localhost:3141/root/pypi
$ devpi upload dist/mypackage-1.0.0-py3-none-any.whl
# Install from your private index
$ pip install mypackage --index-url http://localhost:3141/root/pypi/+simple/
AWS CodeArtifact¶
# Get auth token
$ aws codeartifact get-authorization-token --domain myorg --query authorizationToken --output text
# Configure pip
$ pip install mypackage --index-url https://aws:TOKEN@myorg-123456789.d.codeartifact.us-east-1.amazonaws.com/pypi/internal/simple/
pip.conf for Default Index¶
# ~/.pip/pip.conf (or in a virtualenv: .venv/pip.conf)
[global]
index-url = https://pypi.org/simple/
extra-index-url = https://your-private-pypi.example.com/simple/
trusted-host = your-private-pypi.example.com
Docker and Python Packaging¶
Multi-Stage Build Pattern¶
# Stage 1: Build dependencies (includes build tools)
FROM python:3.11-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev && \
rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: Runtime (no build tools, smaller image)
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq5 && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
COPY . /app
WORKDIR /app
USER 1000
CMD ["gunicorn", "app:create_app()", "-b", "0.0.0.0:8000"]
Key Docker + Python Rules¶
# Always use --no-cache-dir to reduce image size
RUN pip install --no-cache-dir -r requirements.txt
# Copy requirements.txt before source code for better layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Pin the base image digest for reproducibility
FROM python:3.11-slim@sha256:abc123...
# Don't use virtualenvs in Docker (you already have isolation)
# But if you must (for multi-stage builds), set:
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
System Packages vs pip Packages¶
On Linux, Python packages come from two sources:
| Source | Location | Manager |
|---|---|---|
| System packages | /usr/lib/python3/dist-packages/ |
apt, dnf, yum |
| pip packages | /usr/local/lib/python3.11/dist-packages/ (global) |
pip |
| venv packages | .venv/lib/python3.11/site-packages/ |
pip |
Never mix system and pip packages. System packages exist for system tools (apt, cloud-init, etc.). Application dependencies go in virtualenvs.
Fun fact: PEP 668 (the "externally managed environment" error introduced in Python 3.11+) was specifically motivated by Ubuntu and Debian maintainers who were tired of users breaking system Python with
pip install --user. The error message is intentionally annoying — it forces you toward venvs, which is the correct practice.
# PEP 668 (Python 3.11+): pip refuses to install system-wide
$ pip install requests
# error: externally-managed-environment
# This is CORRECT behavior. Use a venv.
$ python3 -m venv .venv && source .venv/bin/activate
$ pip install requests # Works fine in the venv
site-packages Layout¶
Understanding where Python looks for packages:
import sys
# Module search path
print(sys.path)
# ['', '/usr/lib/python311.zip', '/usr/lib/python3.11',
# '/usr/lib/python3.11/lib-dynload',
# '/home/user/.venv/lib/python3.11/site-packages']
# Where a specific package is installed
import requests
print(requests.__file__)
# /home/user/.venv/lib/python3.11/site-packages/requests/__init__.py
# All site-packages directories
import site
print(site.getsitepackages())
PYTHONPATH¶
PYTHONPATH prepends directories to sys.path, affecting where Python searches for imports.
# Add a directory to the import path
$ PYTHONPATH=/opt/mylibs:$PYTHONPATH python script.py
# Common use: make a project importable during development
$ PYTHONPATH=. python -m mypackage.cli
# Warning: PYTHONPATH is global. Setting it system-wide affects ALL Python programs.
# Prefer `pip install -e .` over PYTHONPATH hacks.
Dependency Auditing¶
# pip-audit: check for known vulnerabilities
$ pip install pip-audit
$ pip-audit
# Found 2 known vulnerabilities in 1 package
# Name Version ID Fix Versions
# pillow 9.0.0 PYSEC-2023- >=9.3.0
# safety: alternative vulnerability scanner
$ pip install safety
$ safety check
# pip-licenses: check license compliance
$ pip install pip-licenses
$ pip-licenses --format=table
Wiki Navigation¶
Prerequisites¶
- Python for Infrastructure (Topic Pack, L1)
Related Content¶
- Perl Flashcards (CLI) (flashcard_deck, L1) — Python Automation
- Python Async & Concurrency (Topic Pack, L2) — Python Automation
- Python Debugging (Topic Pack, L1) — Python Automation
- Python Drills (Drill, L0) — Python Automation
- Python Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Python Automation
- Python Flashcards (CLI) (flashcard_deck, L1) — Python Automation
- Python for Infrastructure (Topic Pack, L1) — Python Automation
- Skillcheck: Python Automation (Assessment, L0) — Python Automation
- Software Development Flashcards (CLI) (flashcard_deck, L1) — Python Automation