Docker Multi-stage build
Container Images with a smaller memory footprint
Containers are commonly used to host applications today. Almost any application, has two types of dependencies: build-time and run-time. Typically, the build-time dependencies are much generally higher in number and have more CVEs in them, than the run-time ones. In a classic Docker build, all dependencies and tools needed for building an application & running are typically included in a single Docker image. This creates an oversized image. Oversized Docker images have the following downside attached to them:
- Slower build and deployment cycles
- Increased storage and bandwidth costs
- Reduced application responsiveness
Multi-stage build feature enables us to create smaller container images with better caching and a smaller security footprint. In this blog, we will focus on the concept of multi-stage builds in Docker, creating a lightweight and secure Docker image.
What are Multi-Stage Builds?
Docker multi-stage build is a feature of Docker that allows us to divide Docker build into multiple intermediate stages. This also means we can separate the build environment from the runtime environment into different stages. Each stage starts with a fresh image that can be used to perform specific tasks and finally the results/artifacts of these intermediate stages will be clubbed together to generate a final image.
Example Dockerfile for multistage build for an application written in Go.
# Build stage
FROM golang:1.23 AS build
WORKDIR /app
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o binary .
# Runtime stage
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /app/binary /app/binary
ENTRYPOINT ["/app/binary"]
Following points are to be noted with respect to the multi-stage builds:
- Every
FROM
instruction defines a stage. - The
AS
aliases are optional - if you don't name your stages, they still can be referred to by their sequence number. - The
COPY
happens--from
a stage. - The order of stages in the Dockerfile matters — it’s impossible to
COPY --from
a stage defined below the current stage. - When all stages and
COPY --from=<stage>
instructions are defined in one Dockerfile, the Docker build engine (BuildKit) can compute the right build order, skip unused, and execute independent stages concurrently.
Example Dockerfile for multistage build for an application written in Rust.
# Build stage
FROM rust:1.67 AS build
WORKDIR /usr/src/app
COPY . .
RUN cargo install --path .
# Runtime stage
FROM debian:bullseye-slim
RUN apt-get update && \
apt-get install -y extra-runtime-dependencies && \
rm -rf /var/lib/apt/lists/*
COPY --from=build /usr/local/cargo/bin/app /usr/local/bin/app
CMD ["myapp"]
Example Dockerfile for multistage build for an application written in Python.
# syntax=docker/dockerfile:1
# Base image
FROM python:3.12 as build
RUN apt-get update && apt-get install -y build-essential curl
ENV VIRTUAL_ENV=/opt/venv \
PATH="/opt/venv/bin:$PATH"
ADD https://astral.sh/uv/install.sh /install.sh
RUN chmod -R 655 /install.sh && /install.sh && rm /install.sh
COPY ./requirements.txt .
RUN /root/.cargo/bin/uv venv /opt/venv && \
/root/.cargo/bin/uv pip install --no-cache -r requirements.txt
# App image
FROM python:3.12-slim-bookworm
COPY --from=build /opt/venv /opt/venv
# Activate the virtualenv in the container
# See here for more information:
ENV PATH="/opt/venv/bin:$PATH"
# Security Context
RUN useradd -m nonroot
USER nonroot
# Env configuration
WORKDIR /opt
COPY app.py .
EXPOSE 8080
ENTRYPOINT ["python","app.py"]
Some other optimization techniques
Layer Optimization
Each instruction (like RUN, COPY, ADD) we write in our Dockerfile Creates a new layer in Docker image. These layers add to image size as well as the build time. We should minimize the number of layers.
- Combining multiple run commands into a single instruction. This approach minimizes redundant layers and keeps the image cleaner.
- Use && to chain commands and clean up in the same layer.
Use Docker Buildkit
Docker BuildKit offers improved performance, security and more flexible cache invalidation for building docker images. Enable it with
DOCKER_BUILDKIT=1 docker build -t myapp .
Organize your Dockerfile
Place the stages that are less likely to change towards the beginning of the Dockerfile. This allows the cache to be reused more effectively for subsequent builds.
Move package installation layer at top of the layer. We can use package.json, requirement.txt from the code. Copy it first and run the package installation. We should copy the app code afterewards, this will enusre that the package installation layer is only recreated when we update them, not when our business logic code has changed.
Eliminating Redundant Files
We should only include the artifacts in the image that rarely change. Things like configuration or resources can be mounted on the container rather than adding them in the image.
We should leverage .dockerignore file similar to .gitignore. It lets us exclude specific files and directories from our final image. We can add .dockerignore file to the root of our project. For example, we could add large data files, Logs, model checkpoints, and temporary files to your .dockerignore file.
# Exclude large datasets
data/
# Exclude virtual environment
venv/# Exclude cache, logs, and temporary files
__pycache__/
*.log
*.tmp
*.pyc
*.pyo
*.pyd
.pytest_cache
.git
.gitignore
README.md # Exclude model training checkpoints and tensorboard logs
checkpoints/
runs/
Using slimmer images for build & runtime
- alpine is a very popular container image to decrease the size of the container images for our own containerized applications.
- Distroless images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs.
- Scratch is an empty image, so it is ideal for statically linked binaries that do not require libc. Go, Rust, and other languages can compile to such binaries.
- Chiselled Ubuntu is a variation of Distroless container images built using the packages from the Ubuntu distribution
- Chainguard Images are a collection of container images designed for security and minimalism. Many Chainguard Images are distroless; they contain only an open-source application and its runtime dependencies
- UBI Micro is a high-quality container image with a minimized attack surface and is great for some highly minimized applications.
Debugging the Container
To avoid the inconvenience of not being able to debug our container when using distroless, Distroless provides an alternative. We can use images with :debug
tag which contains shell access (via BusyBox shell). If our application is running into troubles, we should keep a debug version of our Dockerfile that we can deploy in case we need to kubectl exec
or docker exec
into our container.
Conclusion
Multi-stage builds in Docker offer a powerful method to create lean, secure, and efficient container images. By separating the build environment from the runtime environment, we can significantly reduce the size of Docker images without sacrificing functionality.
Appendix
BuildKit features
Debug build failures.
# syntax=docker/dockerfile:1.3
export BUILDX_EXPERIMENTAL=1
export BUILDKIT_PROGRESS=plain
export BUILDKIT_COLORS="run=green:warning=yellow:error=red:cancel=cyan"
docker buildx debug --invoke /bin/sh --on=error build .
If the build fails ( --on=error
) at any point we will be dropped into the container and can explore the context and debug.
Here-docs allows us to pass multiline scripts into RUN
and COPY
commands:
# syntax = docker/dockerfile:1.3-labs
FROM debian
RUN <<eot bash
apt-get update
apt-get install -y vim
eot
In the past we would have to use &&
if we wanted to put multiple commands into single RUN
, now with here-docs, we can write a normal script. Additionally, first line can specify interpreter, so we can — for example — write a Python script too:
# syntax = docker/dockerfile:1.3-labs
FROM python:3.6
RUN <<eot
#!/usr/bin/env python
print("hello world")
eot