Docker

Is Your Container Image Really Distroless?

Containerization helped drastically improve the security of applications by providing engineers with greater control over the runtime environment of their applications. However, a significant time investment is required to maintain the security posture of those applications, given the daily discovery of new vulnerabilities as well as regular releases of languages and frameworks. 

The concept of “distroless” images offers the promise of greatly reducing the time needed to keep applications secure by eliminating most of the software contained in typical container images. This approach also reduces the amount of time teams spend remediating vulnerabilities, allowing them to focus only on the software they are using. 

In this article, we explain what makes an image distroless, describe tools that make the creation of distroless images practical, and discuss whether distroless images live up to their potential.

2400x1260 is your image really distroless

What’s a distro?

A Linux distribution is a complete operating system built around the Linux kernel, comprising a package management system, GNU tools and libraries, additional software, and often a graphical user interface.

Common Linux distributions include Debian, Ubuntu, Arch Linux, Fedora, Red Hat Enterprise Linux, CentOS, and Alpine Linux (which is more common in the world of containers). These Linux distributions, like most Linux distros, treat security seriously, with teams working diligently to release frequent patches and updates to known vulnerabilities. A key challenge that all Linux distributions must face involves the usability/security dilemma. 

On its own, the Linux kernel is not very usable, so many utility commands are included in distributions to cover a large array of use cases. Having the right utilities included in the distribution without having to install additional packages greatly improves a distro’s usability. The downside of this increase in usability, however, is an increased attack surface area to keep up to date. 

A Linux distro must strike a balance between these two elements, and different distros have different approaches to doing so. A key aspect to keep in mind is that a distro that emphasizes usability is not “less secure” than one that does not emphasize usability. What it means is that the distro with more utility packages requires more effort from its users to keep it secure.

Multi-stage builds

Multi-stage builds allow developers to separate build-time dependencies from runtime ones. Developers can now start from a full-featured build image with all the necessary components installed, perform the necessary build step, and then copy only the result of those steps to a more minimal or even an empty image, called “scratch”. With this approach, there’s no need to clean up dependencies and, as an added bonus, the build stages are also cacheable, which can considerably reduce build time. 

The following example shows a Go program taking advantage of multi-stage builds. Because the Golang runtime is compiled into the binary, only the binary and root certificates need to be copied to the blank slate image.

FROM golang:1.21.5-alpine as build
WORKDIR /
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o my-app
 
 
FROM scratch
COPY --from=build
  /etc/ssl/certs/ca-certificates.crt
  /etc/ssl/certs/ca-certificates.crt
COPY --from=build /my-app /usr/local/bin/my-app
ENTRYPOINT ["/usr/local/bin/my-app"]

BuildKit

BuildKit, the current engine used by docker build, helps developers create minimal images thanks to its extensible, pluggable architecture. It provides the ability to specify alternative frontends (with the default being the familiar Dockerfile) to abstract and hide the complexity of creating distroless images. These frontends can accept more streamlined and declarative inputs for builds and can produce images that contain only the software needed for the application to run. 

The following example shows the input for a frontend for creating Python applications called mopy by Julian Goede.

#syntax=cmdjulian/mopy
apiVersion: v1
python: 3.9.2
build-deps:
  - libopenblas-dev
  - gfortran
  - build-essential
envs:
  MYENV: envVar1
pip:
  - numpy==1.22
  - slycot
  - ./my_local_pip/
  - ./requirements.txt
labels:
  foo: bar
  fizz: ${mopy.sbom}
project: my-python-app/

So, is your image really distroless?

Thanks to new tools for creating container images like multi-stage builds and BuildKit, it is now a lot more practical to create images that only contain the required software and its runtime dependencies. 

However, many images claiming to be distroless still include a shell (usually Bash) and/or BusyBox, which provides many of the commands a Linux distribution does — including wget — that can leave containers vulnerable to Living off the land (LOTL) attacks. This raises the question, “Why would an image trying to be distroless still include key parts of a Linux distribution?” The answer typically involves container initialization. 

Developers often have to make their applications configurable to meet the needs of their users. Most of the time, those configurations are not known at build time so they need to be configured at run time. Often, these configurations are applied using shell initialization scripts, which in turn depend on common Linux utilities such as sed, grep, cp, etc. When this is the case, the shell and utilities are only needed for the first few seconds of the container’s lifetime. Luckily, there is a way to create true distroless images while still allowing initialization using tools available from most container orchestrators: init containers.

Init containers

In Kubernetes, an init container is a container that starts and must complete successfully before the primary container can start. By using a non-distroless container as an init container that shares a volume with the primary container, the runtime environment and application can be configured before the application starts. 

The lifetime of that init container is short (often just a couple seconds), and it typically doesn’t need to be exposed to the internet. Much like multi-stage builds allow developers to separate the build-time dependencies from the runtime dependencies, init containers allow developers to separate initialization dependencies from the execution dependencies. 

The concept of init container may be familiar if you are using relational databases, where an init container is often used to perform schema migration before a new version of an application is started.

Avatar

shahbazchandio

About Author

Leave a comment

Your email address will not be published. Required fields are marked *

You may also like

Docker

containerd vs. Docker: Understanding Their Relationship and How They Work Together

During the past decade, containers have revolutionized software development by introducing higher levels of consistency and scalability. Now, developers can work without
Docker

11 Years of Docker: Shaping the Next Decade of Development

Eleven years ago, Solomon Hykes walked onto the stage at PyCon 2013 and revealed Docker to the world for the