Tips for smaller docker images

February 2018

Introduction

Docker builds images automatically, by reading instructions from a Dockerfile. This is a plaintext file that contains commands, in order, needed to build your image. This file must comply with the requirements - it uses specific format and specific set of instructions. Should you have any questions (or you are new to docker), do not hesitate to read Dockerfile Reference page. If you are new to writing Dockerfiles, you should start there.

When building Docker images you should always aim for the smallest size possible. Furthermore, if you share layers among images it will be easier (and faster) to deploy targeted apps.

As you know, every Dockerfile’s command creates new layer. You may have noticed that many Dockerfiles available to the public uses following trick:

FROM debian

RUN set -x && apt-get update && apt-get install -y --no-install-recommends bzip gcc && rm -rf /var/lib/{apt,dpkg,cache,log}/ /tmp/* /var/tmp/*

which is reasonable solution. Adding bzip and gcc to the base image of debian as a single command creates targeted app in size of 212MB (at the time of writing):

$ docker build -t="debian-bzip-gcc-v1" .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM debian
 ---> 1b3ec9d977fb
Step 2/2 : RUN apt-get -qq update && apt-get install -qq -y --no-install-recommends bzip2 gcc && rm -rf /var/lib/{apt,dpkg,cache,log}/ /tmp/* /var/tmp/*
 ---> Running in e89204303fe4
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 6487 files and directories currently installed.)
.
.
.
Setting up bzip2 (1.0.6-8.1) ...
Setting up gcc (4:6.3.0-4) ...
Processing triggers for libc-bin (2.24-11+deb9u1) ...
Removing intermediate container e89204303fe4
 ---> b83f4616276b
Successfully built b83f4616276b
Successfully tagged debian-bzip-gcc-v1:latest
$ docker history debian-bzip-gcc-v1
IMAGE               CREATED             CREATED BY                                      SIZE
b83f4616276b        28 seconds ago      /bin/sh -c apt-get -qq update && apt-get ins…   112MB               
1b3ec9d977fb        12 days ago         /bin/sh -c #(nop)  CMD ["bash"]                 0B                  
<missing>           12 days ago         /bin/sh -c #(nop) ADD file:7d3b21b18d7bc6d6d…   100MB               
$ docker images debian-bzip-gcc-v1
REPOSITORY           TAG                 IMAGE ID            CREATED              SIZE
debian-bzip-gcc-v1   latest              b83f4616276b        About a minute ago   212MB

Every layer use disk space. You can see it by yourself when pulling images from registry. To see another example in action, check out Dockerfile for buildpack-deps.

OK, that’s it. If you are interested in smaller docker images let’s go for another example, shall we?

Use multi-stage builds

Since Docker 17.05 you can use multi-stage build to reduce size of your final image. It works without the need to jump through hoops to reduce the number of intermediate layers or remove intermediate files during the build.

Most of the time you should benefit both the build cache and minimize image layers.

Example of build stages:
1. Install tools you need to build you app.
2. Install / update library dependencies
3. Generate your application.

In this example we will build Go app image.

Start with main.go:

package main

import "fmt"

func main() {
	fmt.Println("Hello, world!")
}

First, let’s containerize this app with following Dockerfile:

FROM golang:1.8

WORKDIR /go/src/app
ADD . /go/src/app

RUN go-wrapper download   
RUN go-wrapper install

CMD ["/go/bin/app"]

Build and run the image with:

$ docker build -t="go-v1" .
Sending build context to Docker daemon  3.072kB
Step 1/6 : FROM golang:1.8 as build
 ---> 0d283eb41a92
Step 2/6 : WORKDIR /go/src/app
 ---> Using cache
 ---> 179e54f72c42
Step 3/6 : ADD . /go/src/app
 ---> 497ef265fb8b
Step 4/6 : RUN go-wrapper download
 ---> Running in db28e8cebfc3
+ exec go get -v -d
Removing intermediate container db28e8cebfc3
 ---> a2ff96353469
Step 5/6 : RUN go-wrapper install
 ---> Running in 6f9c999af90b
+ exec go install -v
app
Removing intermediate container 6f9c999af90b
 ---> 990d17245be8
Step 6/6 : CMD ["/go/bin/app"]
 ---> Running in be3400a7efef
Removing intermediate container be3400a7efef
 ---> 11ba6fa5350b
Successfully built 11ba6fa5350b
Successfully tagged go-v1:latest
$ docker run go-v1 
Hello, world!

Well, let’s check how big is our image:

$ docker images go-v1
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
go-v1               latest              11ba6fa5350b        14 seconds ago      715MB

OK, let’s try multi-stage Docker build. With this aproach you use multiple FROM commands in your Dockerfile. Each FROM can use a different base, which begins a new stage of build. You selectively copy artifacts from one stage to another, leaving behind everything you don’t want in final app. Let’s adapt Go example to use multi-stage builds.

FROM golang:1.8 as stage1

WORKDIR /go/src/app
ADD . /go/src/app

RUN go-wrapper download   
RUN go-wrapper install

FROM golang:1.8
COPY --from=stage1 /go/bin/app /

CMD ["/app"]

Build and run:

$ docker build -t="go-v2" .
Sending build context to Docker daemon  4.096kB
Step 1/8 : FROM golang:1.8 as build
 ---> 0d283eb41a92
Step 2/8 : WORKDIR /go/src/app
 ---> Using cache
 ---> 179e54f72c42
Step 3/8 : ADD . /go/src/app
 ---> ba0b86b4db8e
Step 4/8 : RUN go-wrapper download
 ---> Running in 8c35165c884f
+ exec go get -v -d
Removing intermediate container 8c35165c884f
 ---> c9b852cd2bb6
Step 5/8 : RUN go-wrapper install
 ---> Running in be3d7d1bdcb2
+ exec go install -v
app
Removing intermediate container be3d7d1bdcb2
 ---> 4a26015829f9
Step 6/8 : FROM golang:1.8
 ---> 0d283eb41a92
Step 7/8 : COPY --from=build /go/bin/app /
 ---> 11a3bf1274ce
Step 8/8 : CMD ["/app"]
 ---> Running in 737e17331cd5
Removing intermediate container 737e17331cd5
 ---> 47b521392bfc
Successfully built 47b521392bfc
Successfully tagged go-v2:latest
$ docker run --rm -it go-v2
Hello, world!

It works! Go ahead and inspect the image:

$ docker images go-v2
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
go-v2               latest              47b521392bfc        17 seconds ago      710MB

go-v2 target image consists of fewer intermediate layers, so we saved few MB. Can we save more?

Use alpine version of base images

Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.

There are many base images available to the public using alpine (instead of debian, ubuntu, centos…) as a ‘core’. If you are looking for one, just make sure it contains ‘-alpine’ sufix in tag.

Alpine Linux image is only 5MB, it has access to package manager (so it should cover 98% real world cases ;))

There is alpine version of golang:1.8 image, compare size of both:

$ docker images "golang:1.8*"
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
golang              1.8                 0d283eb41a92        10 days ago         713MB
golang              1.8-alpine          4cb86d3661bf        2 weeks ago         257MB

257MB instead of 713MB! We only add -alpine to FROM command!

Going back to our case, replace regular golang image with alpine version:

FROM golang:1.8-alpine

WORKDIR /go/src/app
ADD . /go/src/app

RUN go-wrapper download   
RUN go-wrapper install

CMD ["/go/bin/app"]

Build & run:

$ docker build -t="go-v3" -f Dockerfile-v3 .
Sending build context to Docker daemon   5.12kB
Step 1/6 : FROM golang:1.8-alpine
 ---> 4cb86d3661bf
Step 2/6 : WORKDIR /go/src/app
Removing intermediate container e52de6fd1e15
 ---> 6d0cacab27a1
Step 3/6 : ADD . /go/src/app
 ---> 5e631e28f90e
Step 4/6 : RUN go-wrapper download
 ---> Running in d302fc0274ff
+ exec go get -v -d
Removing intermediate container d302fc0274ff
 ---> ff6e6651161b
Step 5/6 : RUN go-wrapper install
 ---> Running in a728130421c1
+ exec go install -v
app
Removing intermediate container a728130421c1
 ---> 820d37d80391
Step 6/6 : CMD ["/go/bin/app"]
 ---> Running in 7a539c1adf69
Removing intermediate container 7a539c1adf69
 ---> c5b936627ac1
Successfully built c5b936627ac1
Successfully tagged go-v3:latest
$ docker run --rm -it go-v3
Hello, world!

It worked. Where is the catch? Vanilla images use full glibc as a standard C library, alpine use muslc package. Muslc use less space, but some dependencies in your project may not work when compiled against glibc. So before you replace all FROM commands in your Dockerfiles repo be sure it is fine for your case.

Google’s distroless base images contains only runtime dependencies from your app. No package managers, shells or whatever you would expect to find in regular Linux distro (vanilla images).

How to get them? Distroless project use gcr.io docker registry, at the time of writing following images were published:

  • gcr.io/distroless/base
  • gcr.io/distroless/python2.7
  • gcr.io/distroless/python3
  • gcr.io/distroless/nodejs
  • gcr.io/distroless/java
  • gcr.io/distroless/java/jetty
  • gcr.io/distroless/cc
  • gcr.io/distroless/dotnet

Going back to our example, let’s use distroless base image now. As it is written in the documentation, it is used to run Go apps.

FROM golang:1.8 as stage1

WORKDIR /go/src/app
ADD . /go/src/app

RUN go-wrapper download   
RUN go-wrapper install


FROM gcr.io/distroless/base

COPY --from=stage1 /go/bin/app /

CMD ["/app"]

Build & run to see if we are good or not:

docker build -t="go-v4" .
Sending build context to Docker daemon  6.144kB
Step 1/8 : FROM golang:1.8 as stage1
 ---> 0d283eb41a92
Step 2/8 : WORKDIR /go/src/app
 ---> Using cache
 ---> 179e54f72c42
Step 3/8 : ADD . /go/src/app
 ---> cbfdbc805980
Step 4/8 : RUN go-wrapper download
 ---> Running in 1440550687ea
+ exec go get -v -d
Removing intermediate container 1440550687ea
 ---> 6e26b4a6d177
Step 5/8 : RUN go-wrapper install
 ---> Running in 8f0ba0fd5008
+ exec go install -v
app
Removing intermediate container 8f0ba0fd5008
 ---> 23edaf0d592e
Step 6/8 : FROM gcr.io/distroless/base
latest: Pulling from distroless/base
bb8371eaf726: Pull complete 
Digest: sha256:4f28178a3746a9145742c5802e4a2479b2cd39f6359db5ec8b7e7f7b4a592039
Status: Downloaded newer image for gcr.io/distroless/base:latest
 ---> 89c6ea43854e
Step 7/8 : COPY --from=stage1 /go/bin/app /
 ---> 15a4c3f5b291
Step 8/8 : CMD ["/app"]
 ---> Running in 52154b62c25f
Removing intermediate container 52154b62c25f
 ---> 8c2fcc22abbc
Successfully built 8c2fcc22abbc
Successfully tagged go-v4:latest
$ docker run --rm -it go-v4
Hello, world!

Well, that went fine. What about final size?

$ docker images "go-v*"
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
go-v4               latest              8c2fcc22abbc        16 seconds ago      18.1MB
go-v3               latest              c5b936627ac1        About an hour ago   259MB
go-v2               latest              47b521392bfc        18 hours ago        710MB
go-v1               latest              11ba6fa5350b        19 hours ago        715M

18.1MB final app size when using multi-stage build with distroless image from Google. Almost 700MB less than our first image. Excellent!

Besides size of final image there is something more you should notice. There are no extra binaries, libs etc. There is no shell available in this image. So debugging would be harder. But attack surface area is as minimal as possible.

We recommend multi-stage approach, with different stages, integrated with your CI environment. For example:

  • debug stage with all debugging symbols, tools enabled
  • testing stage with your application that gets populated with test data
  • production stage with your app working on real data, no extra dependencies, shells