Python Project Management Primer

Alleviating Python Developer Pain

Jun 19, 2024

In the first post of this Substack series, we will begin with a fundamental aspect of every Python application: Python project management. Starting with simple ideas and issues, we will gradually progress to more complex scenarios. Along the way, we will explore concepts and tools that help us address the most common dependency problems our applications encounter. This post aims to benefit both researchers and data scientists with little experience in application development, as well as engineers who are already proficient with Python.

Already technically proficient readers, or those eager to dive right in, can skip to the conclusions or explore the py-manage repository. The repository provides direct implementations of the discussed concepts for both standard and mono-repository setups.

Scope of the Article
Motivation
Understanding the Problem
- Unspecified Dependency Versions
- Sub-Dependencies
- Lack of a .lock File
Project Environment Management
- Python Virtual Environments
- Managing Python Versions
- Isolating Global Python CLI Applications
- Managing Python Project Dependencies
  - Poetry Configuration
  - Python Project Configuration
Workflows
- Starting a New Project
- Installing an Existing Project
- Developing Locally
- Continuous Integration (CI) Pipeline
Project Structure
- Standard Structure
- Mono-Repository Structure
Conclusions
Appendix
- pip freeze
- Additional Python Dependency Managers

Scope of the Article

This article explores how to manage Python project environments and dependencies, as well as how to structure projects effectively. Given the breadth of these topics, we will not cover building Docker images for Python services and their deployment within this article. Those subjects will be addressed in upcoming posts.

Motivation

You have to update the request schema for one of the endpoints in an old FastAPI web service and deploy the new version. You locate the long-forgotten service sub-directory in the mono-repo, adjust the pydantic model, modify a single function and update one unittest. While doing that, you notice the requirements.txt file:

boto3
cryptography
fastapi
matplotlib
mypy
nltk
numpy==1.16.4
pandas
pymysql
pyyaml<=5.0.0
pytest
requests
s3fs
scikit-learn>=1.0.0
scipy
seaborn
spacy
sqlalchemy
typing-extensions
ujson==4.0.2
uvicorn

Something looks off, but you’re not sure what. Regardless, you don’t have time to set up the virtual environment locally to run the tests, so you commit your changes and push them to GitHub, letting the GitHub Actions handle the rest. Unfortunately, within two minutes, you receive an email informing you that the CI pipeline has failed. You click on the link in the email and see red text in the GitHub Workflow logs:

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

You don't understand the error since you haven't changed the requirements.txt file, and there have been no CI failures for months. After fifteen minutes of examining the workflow, you realize the pipeline runs only when a git diff is detected—and there hasn't been one in a while. To fix the pip install step, you adjust the dependency versions in the requirements.txt file. After an hour and a half, you manage to pin dependencies to specific versions, allowing the pip install to succeed locally. You push the new commit and observe the CI pipeline. The pip install step succeeds. Just as you think the pull request is ready for review, you receive an email stating the CI pipeline has failed again. Clicking the link, you see an error log from pytest showing that half of the test cases have failed. What is happening here?

Understanding the Problem

While the scenario described above might seem distant to the users of languages like Rust (with its cargo.lock) and Go (with its go.sum), this experience is unfortunately all too common in Python projects. This section will explore what went wrong in the hypothetical scenario.

Unspecified Dependency Versions

The requirements.txt file shown above:

boto3
cryptography
fastapi
matplotlib
mypy
nltk
numpy==1.16.4
pandas
pymysql
pyyaml<=5.0.0
pytest
requests
s3fs
scikit-learn>=1.0.0
scipy
seaborn
spacy
sqlalchemy
typing-extensions
ujson==4.0.2
uvicorn

isn’t a file you want to see in your production-facing service. Most of the dependencies do not have their versions specified. This means that the latest version of each dependency will be installed during the pip install. Under these circumstances, there is a significant probability that a new major release (as defined by standard semantic versioning) of a dependency will break or alter the behavior of your application. Even minor or patch updates can have unexpected consequences, as the semantic versioning is defined by the project's maintainers, and errors occasionally slip in. Given the high number of dependencies, the probability of behavior-breaking changes in the application increases rapidly over time.

But even if the requirements.txt file strictly pinned dependency versions (using ==), there is another issue hiding under the surface.

Sub-Dependencies

A sub-dependency (also known as a transitive dependency) is a dependency of a direct dependency. For example, if your project depends on FastAPI, and FastAPI, in turn, depends on pydantic, then pydantic is a sub-dependency of your project.

PyPI packages are designed for code distribution and will often have their own dependencies specified to cover a wide range of versions—for example, the Pydantic versions used in FastAPI. This makes sense because the consumers of these packages are other developers with varying environments. The goal is to make the package compatible with as many environments as possible, excluding the specific versions that break it. However, the issue of patch or minor updates (if they are allowed within the package) still applies—there is no guarantee that an update in one of its dependencies won’t break the package as a whole. To maximize distribution, strict version pinning should be done on the consumer side.

Additionally, some projects are less mature than others. For instance, while you might trust projects like FastAPI and Pydantic to adhere to semantic versioning and thoroughly test their releases, newer or less established projects, such as “cutting-edge LLM API-wrapper” packages, may not be as diligent. These projects might introduce breaking changes in minor or patch updates. Moreover, they often use loose version specifications, such as >= in their requirements.txt or pyproject.toml files, which allow for major (API breaking) updates.

In such cases, one would ideally like to have a mechanism that describes the world's state at the time of a successful application build and allows that state to be fixed for further use cases. This is where the concept of .lock the file comes in.

Lack of a .lock File

A .lock file is used to ensure the consistency and reproducibility of software builds by locking the versions of dependencies. When a project specifies its dependencies, those dependencies often have their own dependencies (transitive dependencies). The .lock file records the exact versions of all dependencies (both direct and transitive) that were resolved during a build or dependency installation process. Key benefits of a .lock file would be:

Consistency: Ensures that every environment that builds the project uses the exact same versions of dependencies.
Reproducibility: Makes builds reproducible by providing a snapshot of the entire dependency tree with exact versions.
Integrity: Helps verify the integrity of the dependencies by recording checksums.

Languages like Rust (Cargo.lock) and Go (go.sum) have implemented these concepts in their native package managers. However, this is not the case for Python's pip. Currently, Python does not have a standardized file format for ensuring the reproducibility of dependencies. Although there have been recent attempts, such as PEP 665, they have not been successful.

Luckily, new tools have been developed to tackle this problem in Python's ecosystem. As we delve further into this post, those familiar with JavaScript/TypeScript will find these tools reminiscent of npm and yarn.

Project Environment Management

This section will explore tools and approaches that enable us to efficiently address the problems mentioned earlier. However, before diving in, it is essential to thoroughly understand the concept of Python virtual environments, as it is crucial for comprehending all aspects of tooling.

Python Virtual Environments

When more than one project is involved, the “global Python approach” (all dependencies are installed in a single global environment) quickly breaks down. Different projects almost always require different versions of the same dependencies, which leads to conflicts. Transitive dependencies compound this problem, creating a scenario known as dependency hell. In addition, installing dependencies globally can interfere with system tools that rely on the global Python environment, particularly on Linux and macOS, which use preinstalled Python for internal tasks. Lastly, different projects may require different Python versions due to legacy issues or specific technical needs. This is where the virtual environments come to the rescue:

The venv module supports creating lightweight “virtual environments”, each with their own independent set of Python packages installed in their site directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python, and may optionally be isolated from the packages in the base environment, so only those explicitly installed in the virtual environment are available.

They provide benefits such as:

Dependency isolation - each project has its own set of dependencies.
System integrity - avoid installing packages globally, which might interfere with system tools.
Reproducibility - make it easier to recreate the same environment on different machines, ensuring the code runs the same way everywhere.
Multiple Python Versions - allow you to work with different versions of Python for different projects

Therefore, using a separate virtual environment for each project is highly recommended. Creating a virtual environment in Python is straightforward:

python3 --version
$ Python 3.12.3
# Creates virtual environment called .venv in the current directory.
python3 -m venv .venv
# Activates the created virtual environment
source .venv/bin/activate

Within a virtual environment, we have a standalone Python interpreter and standalone site-packages, but the standard library remains dependent on the base Python installation. This is why the virtual environment created using the “standard” flow is lightweight:

du -sh .venv
$ 15M. .venv

In the official documentation, we can find more details of how venvs work:

When a Python interpreter is running from a virtual environment, sys.prefix and sys.exec_prefix point to the directories of the virtual environment, whereas sys.base_prefix and sys.base_exec_prefix point to those of the base Python used to create the environment. It is sufficient to check sys.prefix != sys.base_prefix to determine if the current interpreter is running from a virtual environment.

and the former is quite easy to check yourself:

source .venv/bin/activate
python
$ Python 3.12.3 (main, May 27 2024, 00:56:53) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
$ Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.base_prefix)
$ /Users/user/.pyenv/versions/3.12.3
>>> print(sys.prefix)
$ /Users/user/example-project/.venv

By following the sys.base_prefix of the virtual environment, we can also find the standard library of the base Python installation:

ls -l /Users/user/.pyenv/versions/3.12.3/lib/python3.12
$ ...
$ -rw-r--r--    1 user  staff    6538 May 27 00:57 abc.py
$ -rw-r--r--    1 user  staff   34211 May 27 00:57 aifc.py
$ -rw-r--r--    1 user  staff     500 May 27 00:57 antigravity.py
$ -rw-r--r--    1 user  staff  101454 May 27 00:57 argparse.py
$ -rw-r--r--    1 user  staff   64260 May 27 00:57 ast.py
$ drwxr-xr-x   36 user  staff    1152 May 27 00:57 asyncio
$ ...

du -sh /Users/user/.pyenv/versions/3.12.3/lib/python3.12
236M	/Users/user/.pyenv/versions/3.12.3/lib/python3.12

Even if a virtual environment is created using python3 -m venv --copies .venv, it will break if the base Python installation is deleted. This command copies the Python binary into the virtual environment instead of symlinking it, if possible, on the given platform. However, the virtual environment still relies on the base installation's standard library and other resources. If the base Python is removed, you will encounter cryptic errors such as:

source .venv/bin/activate
python3
$ dyld[12396]: Library not loaded: /Users/user/.pyenv/versions/3.12.3/lib/libpython3.12.dylib
  Referenced from: <76C880DE-AA71-36CE-A443-AB670961D4FB> /Users/user/code/example/.venv/bin/python3
  Reason: tried: '/Users/user/.pyenv/versions/3.12.3/lib/libpython3.12.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/user/.pyenv/versions/3.12.3/lib/libpython3.12.dylib' (no such file), '/Users/user/.pyenv/versions/3.12.3/lib/libpython3.12.dylib' (no such file)
zsh: abort      python3

This deeper dive into the implementation details of virtual environments was used to highlight two crucial insights:

Virtual environments depend on the base Python installation.
Virtual environments have the same Python version as the base Python installation used to create them.

Managing Python Versions

From this section, we will start introducing concepts and tooling meant to address the problems that were previously analyzed. We will begin at the top level—managing Python versions. The last section highlighted a crucial point: to create isolated virtual environments with specific Python versions, we must effectively manage the Python versions themselves. This is where pyenv comes in.

As pyenv states:

pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well.

The way pyenv does that:

At a high level, pyenv intercepts Python commands using shim executables injected into your PATH, determines which Python version has been specified by your application, and passes your commands along to the correct Python installation.

Installing pyenv is straightforward, and there are various workflows to choose from. However, for most situations, I recommend using the local workflow, which can be summarized as follows:

Navigate to your project's root directory.
Check the desired Python version specified in the pyproject.toml file.
Run pyenv versions to verify if the desired Python version is already installed.
If the desired version is not installed, execute pyenv install <version>.
Finally, set the local Python version by running pyenv local <version>.

After completing these steps, pyenv will automatically select the specified Python version whenever you are in the current directory or any of its subdirectories.

This approach is simple to follow and effective across multiple projects.

Isolating Global Python CLI Applications

In this section, let's get a bit ahead of ourselves. Tools like pyenv do not depend on Python itself (for example, pyenv is made from pure shell scripts). But what if we want to use CLI tools, that depend on Python? In such a case, all the problems discussed above apply, and we would like to isolate these tools in their own virtual environments. For this problem, there is another great tool - pipx:

pipx is a tool to help you install and run end-user applications written in Python. It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.
…
pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages.

With pipx installed, we can progress to the next section.

Managing Python Dependencies

By this point, we already have a few tools:

pyenv to manage Python versions.
pipx to manage global Python CLI applications (if needed).

However, we still need a tool to resolve and install Python project dependencies while generating a .lock file to ensure reproducibility. This is where Poetry comes into play:

Poetry helps you declare, manage and install dependencies of Python projects, ensuring you have the right stack everywhere.
Poetry replaces setup.py, requirements.txt, setup.cfg, MANIFEST.in and Pipfile with a simple pyproject.toml based project format.

Since Poetry depends on Python, it is recommended that you install it using pipx, which we introduced in the previous section. A few workflows can be used with Poetry, but below, I’ll cover the one I find to be the most efficient and robust.

Poetry Configuration

To configure Poetry, create a poetry.toml file within your project. There, I would suggest adding the following configuration:

[virtualenvs]
in-project = true
create = true

Setting in-project = true instructs Poetry to create the virtual environment within the project directory. This configuration has the advantage of keeping all project-related files in one location. It is also the standard location of virtual environments for Python projects, and as a result, IDEs will automatically detect the virtual environment, eliminating the need for additional configurations.

Setting create = true ensures that Poetry will automatically create a virtual environment within the project if one does not already exist. This convenient configuration allows other developers to simply run poetry install to set up the entire project.

Python Project Configuration

pyproject.toml file is a configuration file used in Python projects to specify build system requirements and package dependencies. Introduced by PEP 518, it aims to provide a standardized way to declare the necessary information for building and managing a Python project.

Unfortunately, Poetry does not adhere to the PEP 621 standard for representing project metadata in the pyproject.toml file. Instead, it uses its own custom [tool.poetry] table. Therefore, the configurations detailed below will be specific to Poetry. Note that this custom implementation may change in a future major release.

With poetry.toml in place, the easiest way to start a Python project is to set up local Python version and execute poetry init:

# Commands
pyenv local 3.12.3
poetry init

# Output
This command will guide you through creating your pyproject.toml config.

Package name [py-manage]:
Version [0.1.0]:
Description []:  A comprehensive guide to managing environments for Python projects.
Author [Martynas Subonis <martynas.subonis@gmail.com>, n to skip]:
License []:  MIT
Compatible Python versions [^3.12]:  ~3.12.3

Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file

[tool.poetry]
name = "py-manage"
version = "0.1.0"
description = "A comprehensive guide to managing environments for Python projects."
authors = ["Martynas Subonis <martynas.subonis@gmail.com>"]
license = "MIT"
readme = "README.md"

[tool.poetry.dependencies]
python = "~3.12.3"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"


Do you confirm generation? (yes/no) [yes] yes

The following will create a pyproject.toml file of:

[tool.poetry]
name = "py-manage"
version = "0.1.0"
description = "A comprehensive guide to managing environments for Python projects."
authors = ["Martynas Subonis <martynas.subonis@gmail.com>"]
license = "MIT"
readme = "README.md"

[tool.poetry.dependencies]
python = "~3.12.3"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Running poetry install now will create an empty .venv within a project, as well as an empty .lock file, as the project currently has no dependencies. The next step would be to start adding dependencies:

poetry add fastapi==0.111.0
poetry add uvicorn==0.30.1
poetry add --group=dev mypy==1.10.0
poetry add --group=dev ruff==0.4.8

Note:

mypy is an optional static typing package for Python.
ruff is a Python code formatter and linter.

Adding dependencies with poetry add will:

Add the dependency to the pyproject.toml.
Install the dependency in the virtual environment.
Update the .lock file with the dependency and its transitive dependency requirements.

Note: always remove dependencies using Poetry as well. Do not circumvent the dependency management tool; your .lock files will become out-of-sync.

Another important aspect to keep in mind is the management of dependency groups. By default, poetry add will add dependencies to the main group under [tool.poetry.dependencies]. Only runtime dependencies should be placed here, and for production environments, they should be installed using --only main option (poetry install --only main).

Regardless of which tool is used for dependency management, a common mistake is adding all dependencies, including test runners, type checkers, linters, and formatters, as runtime dependencies. These should only be present in the development environment where such checks are performed. Adding these dependencies to the runtime group unnecessarily inflates the size of the shippable software. It might even allow someone to introduce structural bugs where runtime code depends on dependencies that should only be used in the development setting.

Depending on your use case, it might be beneficial to introduce multiple groups to your project, such as docs (for documentation), cli (for command-line interface installation), etc., and make some of them optional—that is, they are not installed by default unless specified.

If your project involves building and publishing Python packages, you only need to configure the repository settings, include/exclude rules, and credentials. Poetry natively supports build and publish commands, so no further tooling is needed.

Workflows

With our tooling, configuration, and structure in place, we can now cover the most common workflows using these tools. The following examples assume you are in the root directory of your project.

Starting a New Project

pyenv install 3.12.3 # or the version you need
pyenv local 3.12.3
poetry init
... # Configure as needed
poetry install --no-root
install your root package

Installing an Existing Project

pyenv install 3.12.3 # or the version you need
pyenv local 3.12.3
poetry install --no-root

Developing Locally

poetry run ruff format # format the files
poetry run ruff check --fix # apply linting fixes 
poetry run python -m unittest discover
poetry run mypy .

Continuous Integration (CI) Pipeline

poetry install --no-root
poetry run ruff format --check
poetry run ruff check
poetry run python -m unittest discover
poetry run mypy .

Project Structure

This section outlines efficient and organized structures for Python projects, whether you are working on a standard project or a mono-repository setup. We'll explore recommended practices for dependency management, environment isolation, and directory organization.

Standard Structure

By integrating all the previously discussed approaches, we can outline a desired structure for a standard Python project:

standard/
├── .gitignore
├── .python-version
├── .venv/
├── pyproject.toml
├── poetry.lock
├── poetry.toml
├── README.md
├── LICENSE
├── Dockerfile
├── main.py
├── src/
│   ├── __init__.py
│   ├── package_a/
|   │   ├── __init__.py
|   │   ├── module_x.py
|   │   └── ...
│   ├── package_b/
|   │   ├── __init__.py
|   │   ├── module_y.py
|   │   └── ...
│   └── ...
└── tests/
    ├── test_main.py
    ├── package_a/
    |   ├── __init__.py
    |   ├── test_module_x.py
    |   └── ...
    ├── package_b/
    |   ├── __init__.py
    |   ├── test_module_y.py
    |   └── ...
    └── ...

Highlights

Dependency Management: Use dependency groups correctly in pyproject.toml, avoiding including non-runtime dependencies when deploying/distributing the project.
Environment Isolation: Ensure the project has its own isolated environment, including test runners and type checkers, defined as dev dependencies.
Organized Structure: Maintain a clear directory structure separating source code, tests, and configuration files:
- Source Code: All source code resides in the src/ directory (except for the application entry point if needed, which in the above case is main.py).
- Tests: All test code resides in the tests/ directory, mirroring the source code structure for easy navigation.
Poetry Configuration: To include multiple desired packages under the same distribution wheel while maintaining a structured layout, use the following configuration:

packages = [
    { include = "package_a", from = "src", to = "standard" },
    { include = "package_b", from = "src", to = "standard" }
]

The file layout here differs from the standard Python packaging structure. However, the Poetry configuration allows for maintaining a standard deployable service structure while also supporting the packaging of modules (if desired) in a standard manner within the .whl distribution.

Mono-Repository Structure

In the final section, before the conclusions, we will outline how the previous approaches can also be applied in a more complex setting - a mono-repository.

Note: I do not suggest that a mono-repository is the best approach for structuring every project. This decision should be made on a case-by-case basis, and engineers should remember that simpler approaches may be sufficient for most projects.

A structure that I would suggest would be the following:

monorepo/
├── .gitignore
├── .python-version
├── .venv/
├── pyproject.toml
├── poetry.lock
├── poetry.toml
├── README.md
├── LICENSE
├── packages/
│   ├── package_a/
│   │   ├── .python-version
│   │   ├── .venv/
│   │   ├── pyproject.toml
│   │   ├── poetry.lock
│   │   ├── poetry.toml
│   │   ├── README.md
│   │   ├── LICENSE
│   │   ├── package_a/
|   │   │   ├── __init__.py
|   │   │   ├── module_x.py
|   │   │   └── ...
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_module_x.py
│   │       └── ...
│   └── package_b/
│       ├── .python-version
│       ├── .venv/
│       ├── pyproject.toml
│       ├── poetry.lock
│       ├── poetry.toml
│       ├── README.md
│       ├── LICENSE
│       ├── package_b/
|       │   ├── __init__.py
|       │   ├── module_y.py
|       │   └── ...
│       └── tests/
│           ├── __init__.py
│           ├── test_module_y.py
│           └── ...
│   └── ...
└── services/
    ├── service_a/
    │   ├── .python-version
    │   ├── .venv/
    │   ├── src/
    │   │   ├── __init__.py
    │   │   └── ...
    │   ├── Dockerfile
    │   ├── main.py
    │   ├── pyproject.toml
    │   ├── poetry.lock
    │   ├── poetry.toml
    │   └── tests/
    │       ├── __init__.py
    │       ├── test_main.py
    │       └── ...
    └── service_b/
        ├── .python-version
        ├── .venv/
        ├── src/
        │   ├── __init__.py
        │   └── ...
        ├── Dockerfile
        ├── main.py
        ├── pyproject.toml
        ├── poetry.lock
        ├── poetry.toml
        └── tests/
            ├── __init__.py
            ├── test_main.py
            └── ...
    └── ...

Highlights:

Root Directory Configuration:
- Only specify tools that should be uniformly applied across the entire codebase, such as formatters and linters, at the root directory.
Separate Directories for Services and Packages - services and packages have their own directories, as their CI/CD pipelines function differently:
- Packages: The CI/CD pipeline builds and publishes their source and wheel archives.
- Services: The CI/CD pipeline includes Docker build and publish steps, integration tests, and deployment stages.
Isolated Environments - each package and service has its own isolated environment, including test runners and type checkers.
- By including non-native Python test runners like pytest as dev dependencies within each service/package, you can open individual service/package folders as the root in your IDE. This enables the IDE to recognize the test runners and facilitate test execution.
- Due to varying dependency versions across different packages and services, it is crucial to provide exactly specified stubs to ensure accurate type checking. Implementing a dedicated type check runner for each package/service simplifies this process and avoids the error-prone task of dynamically patching stubs using a single type check runner.
- This approach provides the flexibility to use different test and type check runners versions, accommodating services that might not support the same versions.
- Lastly, this approach makes it easy to parallelize CI pipelines, addressing the typically slow process of running tests and type checks during code quality checks.

Conclusions

Managing Python projects can often be a challenging experience. Fortunately, there are increasingly more tools available to help alleviate this pain. In this article, I recommend using the following tools:

pyenv to manage Python versions.
pipx to install and run global Python applications in isolated environments.
Poetry to manage Python project dependencies and packaging.

I have also proposed a project structure that I believe will be robust and easy to use for most large Python projects (its implementation with the above-mentioned tools can be found in the py-manage repository).

As the current state of software can often lead to deprecation and obsolescence, especially in the JavaScript and Python ecosystems, some of the tools mentioned might not remain viable in the long term. However, the core principles of dependency isolation, system integrity, and reproducibility discussed in this article will remain valuable indefinitely. Thank you for reading.

Appendix

pip freeze

One might point out that pip actually has an alternative to a .lock file with its freeze functionality:

pip freeze --help

Usage:
  pip freeze [options]

Description:
  Output installed packages in requirements format.

  packages are listed in a case-insensitive sorted order.

The issue with pip freeze is that it records all currently installed packages, including their exact versions, but does not differentiate between direct dependencies and transitive (sub-)dependencies. The resulting requirements.txt file includes many packages that are not direct dependencies, complicating dependency management and making it prone to errors. A common error is lingering unused dependencies.

Consider a scenario where a developer sets up a project locally:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Here, the requirements.txt file was generated by a previous pip freeze and committed to the remote repository. Now, let’s assume a person updates dependency A from version 1.1 to 1.2. Before version 1.2, dependency A depended on dependency B. But this is no longer the case with version 1.2, and ideally, dependency B would no longer be needed in your virtual environment. However, running:

pip install A==1.2

Would result in:

Collecting A==1.2
...
Installing collected packages: A
  Attempting uninstall: A
    Found existing installation: A 1.1
    Uninstalling A-1.1:
      Successfully uninstalled A-1.1
Successfully installed A-1.2

While the old version of dependency A would be uninstalled and the new one installed, the old dependency B would still remain in the .venv and would remain a dependency in the new pip freeze output.

Fundamentally, there is an issue where the .venv is treated as the source of truth for dependencies, even though the requirements.txt file populates it. If something goes wrong in this setup (which is already prone to errors), breaking the faulty cycle and correcting the issues becomes difficult.

Additionally, the project may encounter problems when developers' local virtual environments drift due to changes in Python minor/patch versions or local misconfigurations. This can result in different outputs in the requirements.txt file and introduce common PYTHONPATH issues.

In this section, we discussed the use of pip freeze in more detail, as it remains a common approach in many Python projects.

Additional Python Dependency Managers

In addition to Poetry, the Python ecosystem includes several other dependency managers:

Given the number of active contributors, clarity of documentation, and range of supported features, Poetry is my preferred dependency manager. However, this is a personal preference, and I encourage developers to explore all available tools—you might find one that better suits your needs.