Optimizing Docker Images for Python Production Services

Crafting Lean Docker Images: Fundamental Concepts and Optimization Practices

Jul 22, 2024

This guide covers best practices for building optimized Docker images for CPU-based Python services, building upon concepts from "Python Project Management Primer". We'll explore fundamental Docker optimization techniques like multi-stage builds and caching strategies, progressing to practical implementations. For experienced developers or those seeking immediate implementation, the py-manage repository offers direct access to working code examples.

In an upcoming article, we'll expand these concepts to cover GPU-accelerated and CUDA-enabled Docker containers, addressing the unique considerations they require.

Note: this article assumes a basic understanding of Docker. For readers new to Docker, I recommend checking the official documentation to grasp its core concepts and purpose before proceeding.

Docker Fundamentals
- Multi-Stage Builds
- Optimizing Caching Strategies
Optimizing Dockerfiles for Python Services
- Crafting an Efficient Dockerfile
- Image Size Optimization: A Comparative Analysis
[Bonus] Compiled Languages: Unlocking Full Optimization Potential
Conclusions

Docker Fundamentals

Before we explore specific implementations of Docker images for services and workbenches, it's crucial to understand two key concepts:

Multi-Stage Builds
Caching

These concepts are fundamental to our approach and will significantly impact our containerization strategies.

Multi-Stage Builds

Multi-stage builds are an underutilized yet powerful feature in Docker. According to the official Docker documentation, multi-stage builds offer two primary advantages:

They allow you to run build steps in parallel, making your build pipeline faster and more efficient.
They allow you to create a final image with a smaller footprint, containing only what's needed to run your program.

The first advantage is self-explanatory. The second warrants further elaboration - when constructing Docker images, we often require specific build tools to generate binaries or artifacts necessary for the final application image. However, once these components are built, the build tools become redundant. Ideally, we want to exclude these tools from the final Docker image to minimize its size. Multi-stage builds enable us to use one stage for compilation and another for the runtime environment, effectively separating build-time dependencies from runtime dependencies. This separation results in a leaner, more efficient final image.

Optimizing Caching Strategies

Docker cache is a mechanism that stores intermediate layers from previous builds. It allows Docker to reuse these layers in subsequent builds when the corresponding Dockerfile instructions remain unchanged, thereby significantly reducing build times and resource consumption.

The official documentation does a great job here explaining how cache works:

Each instruction in this Dockerfile translates to a layer in your final image. You can think of image layers as a stack, with each layer adding more content on top of the layers that came before it.

And how cache gets invalidated (longer explanation can be found here):

Whenever a layer changes, that layer will need to be re-built… If a layer changes, all other layers that come after it are also affected.

Given this information, there are several steps that one can take in order to utilize cache benefits fully:

Position expensive layers early: To minimize the risk of invalidating expensive cache, place computationally intensive or time-consuming layers near the beginning of the Dockerfile.
Place frequently changing layers last: Position layers that change often towards the end of the Dockerfile to limit the number of subsequent layers that need rebuilding.
Keep layers small: Include only necessary files and dependencies to reduce the required cache size.
Minimize layer count: Reduce the total number of layers to limit the potential scope of cache invalidation.

With a solid understanding of multi-stage builds and caching, we can now explore practical implementations of efficient Docker images for Python services and workbenches.

Optimizing Dockerfiles for Python Services

This section will cover how to craft Dockerfiles for Python services. We'll use a standard Python project structure for this guideline:

standard/
├── .gitignore
├── .python-version
├── .venv/
├── pyproject.toml
├── poetry.lock
├── poetry.toml
├── README.md
├── LICENSE
├── Dockerfile
├── main.py
├── src/
│   ├── __init__.py
│   ├── package_a/
│   │   ├── __init__.py
│   │   ├── module_x.py
│   │   └── ...
│   ├── package_b/
│   │   ├── __init__.py
│   │   ├── module_y.py
│   │   └── ...
│   └── ...
└── tests/
    ├── test_main.py
    ├── package_a/
    │   ├── __init__.py
    │   ├── test_module_x.py
    │   └── ...
    ├── package_b/
    │   ├── __init__.py
    │   ├── test_module_y.py
    │   └── ...
    └── ...

Crafting an Efficient Dockerfile

For the containerization exercise, key points include:

Dockerfile context - the root directory of the project.
Dependency management - Poetry.
Entry point - main.py (using FastAPI as the web application).
Source code location - src directory (excluding entry point).
Test location: separate tests directory.

With all of this in mind, we can start writing the build stage of the Dockerfile:

FROM python:3.12.4-slim as builder

RUN pip install --upgrade pip==24.1.1 && \
    pip install poetry==1.8.3

WORKDIR /app

COPY pyproject.toml poetry.toml poetry.lock ./

RUN poetry install --only main

Important details about this build stage:

Base image selection:
- We use python:3.12.4-slim for a smaller footprint.
- For arm64/linux, slim is 155MB vs 1.02GB for the full image.
Dependency management:
- pip and poetry installations are at the top, as they change infrequently.
- Versions are pinned (e.g., pip==24.1.1, poetry==1.8.3) for reproducibility in case the cache gets disabled/invalidated.
File copying strategy:
- pyproject.toml, poetry.toml, and poetry.lock are copied later.
- This optimizes caching, as these files change more frequently.
Installation optimization:
- poetry install --only main installs only runtime dependencies.
- This excludes development and auxiliary dependencies, reducing image size.

Now, we can write the runtime stage of the Dockerfile:

FROM python:3.12.4-slim as builder

RUN pip install --upgrade pip==24.1.1 && \
    pip install poetry==1.8.3

WORKDIR /app

COPY pyproject.toml poetry.toml poetry.lock ./

RUN poetry install --only main

FROM python:3.12.4-slim as runtime

WORKDIR /app

ENV PATH="/app/.venv/bin:$PATH"

COPY src src
COPY main.py .

EXPOSE 8080

COPY --from=builder /app/.venv .venv

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Core concepts about the runtime stage:

Virtual environment setup:
- ENV PATH="/app/.venv/bin:$PATH" prepends the virtual environment's bin directory to the system PATH. This ensures that Python uses packages installed in the virtual environment without explicit activation. It effectively isolates the application's dependencies and simplifies Dockerfile commands by avoiding manual venv activation in each RUN instruction.
Dependency transfer:
- COPY --from=builder /app/.venv .venv copies the virtual environment built by the builder. It is moved away as far as possible to the end of the Dockerfile to fully utilize the parallelization, as the runtime stage will have to wait at this COPY layer until the builder stage is finished. Also, positioning this layer as far as possible reduces the number of layers that need to be rebuilt due to changes in the build stage.
Image optimization:
- For the runtime stage, it is essential to use a docker base image that is as small as possible.

An important point to make for both stages is to use the same base Python image version, as specified in your pyproject.toml.

Image Size Optimization: A Comparative Analysis

Our optimized Dockerfile produces an end image size of 200MB for this project structure and dependencies. In contrast, a naive approach results in a significantly larger image:

# DO NOT USE THIS DOCKERFILE. THIS IS ONLY FOR EDUCATIONAL PURPOSES

FROM python:3.12.4

RUN pip install --upgrade pip==24.1.1 && \
    pip install poetry==1.8.3

WORKDIR /app

COPY . .

RUN poetry install

ENV PATH="/app/.venv/bin:$PATH"

EXPOSE 8080

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

This unoptimized Dockerfile creates a 1.37GB image for arm64/linux architecture/OS.

Note: Be attentive when comparing image sizes listed in the remote registries. The compressed image size is usually provided.

While the build speed improvements for the optimized Dockerfile show a modest enhancement compared to the unoptimized version in this specific example (15.5 ± 1.6s vs 19.9 ± 1.5s for --no-cache builds, a 22% reduction), it's crucial to understand that the impact can vary greatly depending on the complexity and structure of your application's Dockerfile. In more complex scenarios, implementing multi-stage builds can lead to substantial speed increases, potentially reducing build times by a factor of two or more.

[Bonus] Compiled Languages: Unlocking Full Optimization Potential

While multi-stage builds offer significant benefits for all languages, their impact is particularly profound for compiled languages like Go or Rust. Unlike interpreted languages such as Python, which require a runtime interpreter and the necessary tools around it, compiled languages produce standalone executables. This characteristic allows for extreme optimization in Docker images.

Consider Python: even with multi-stage builds, the final image must include the Python interpreter and necessary libraries, resulting in a base image size of at least 100-200MB. In contrast, compiled languages can leverage multi-stage builds to create extraordinarily lean images.

Let's examine a simple "Hello World" HTTP server written in Go:

package main

import (
	"fmt"
	"log"
	"net/http"
)

func helloHandler(w http.ResponseWriter, r *http.Request) {
	if r.URL.Path != "/" {
		http.NotFound(w, r)
		return
	}
	fmt.Fprintf(w, "Hello, World!")
}

func main() {
	http.HandleFunc("/", helloHandler)

	fmt.Println("Server starting on port 8080...")
	if err := http.ListenAndServe(":8080", nil); err != nil {
		log.Fatal(err)
	}
}

This Go application can be containerized using the following Dockerfile:

FROM golang:1.22.5-alpine AS builder

WORKDIR /app

# Copy dependency files first: changes less often, improving cache efficiency
COPY go.mod go.sum* ./
RUN go mod download
COPY *.go ./

RUN CGO_ENABLED=0 go build -ldflags="-w -s" -a -installsuffix cgo -o main .

FROM scratch AS runtime

WORKDIR /app

COPY --from=builder /app/main .

EXPOSE 8080
CMD ["./main"]

The resulting Docker image is remarkably small, at just 4.59MB, achieved through multi-stage builds, the use of a scratch base image, and inclusion of only the compiled binary. While real-world applications may require additional components like SSL certificates or timezone data, compiled language images typically remain significantly smaller than those of interpreted languages. This approach demonstrates how multi-stage builds for compiled languages can produce highly optimized, secure, and performant Docker images containing only the essentials needed to run the application.

Conclusions

Implementing Docker optimization strategies yields significant benefits across several dimensions. The direct impacts of these strategies include:

Drastic reduction in Docker image sizes (e.g., from 1.37GB to 200MB, an 85% decrease).
Improved image build speeds through strategic parallelization and effective cache utilization.

These optimizations have second-order effects on development and operations:

Accelerated build and deployment pipelines, shortening development feedback loop and release times.
Reduced computational resource requirements and lower storage costs.
Auto-scaling performance in cloud environments may be improved, with smaller images enabling faster container start times and more agile resource allocation.

MLOps Shenanigans

Discussion about this post