sebgoa sebgoa on alpine, container, docker, linux, minimalist, python

Thoughts on the Use of Alpine Linux in Docker Images

Earlier this month there has been some news that Docker was considering using Alpine linux for its official images.
Alpine is a minimalist linux distribution based on busybox, musl-libc, a new package manager called apk (not the Android one) and OpenRC as init system. At 5 MB the temptation is great for users to base their Docker images on Alpine and consider a container as an embedded linux “device” :). The great folks at Gliderlabs have written some very good documentation to get started with Alpine in Docker images.

For example, putting a simple Python Flask application in a container image can be done with:

FROM alpine:3.3 RUN apk add --no-cache python py-pip RUN pip install flask ADD /tmp/ ENTRYPOINT ["python","/tmp/"]

And results in a 55 MB image.

The first order issue for most users is indeed container image size. When the first official images of Ubuntu, CentOS and Debian appeared on the Docker hub, they checked in at ~700 MB. Each image contained the entire distribution. After installing your application in such an image you could easily reached GB size container images. At that point, the value proposition that container images were much smaller than VM images was almost moot. Container image size can be an issue with local storage, bandwidth, and it also increases the time to pull and and build your own image.

Of course when creating a container image, one should only put in what it needs. A static binary would be the minimum. However with so much knowledge and habits in packaging applications with standard distribution, it is no surprise that it is where everyone started.

However, the traditional distros have greatly optimized their official container images.

  • CentOS 7:  196 MB
  • Ubuntu 14.04: 188 MB
  • Debian Wheezy: 85 MB  (115 MB with Python with the Dockerfile below)

FROM debian:wheezy RUN apt-get update && apt-get install -y python \ && rm -rf /var/lib/apt/lists/*

The small caveat is that this may not be for the best either, as we now start seeing Dockerfiles that look like this:

FROM scratch ADD rootfs.tar.xz / CMD ["/bin/bash"]

How that root file system is built, by whom and how is its integrity checked when put in the Docker hub images might not be very transparent. One may have to do some additional digging on GitHub to find a kickstart file or build manifest to figure out what is actually running.

The use of the scratch image which is an empty image, is very handy if you have single static binaries, it will lead to the smallest container image possible. However if you have dependencies you may have to manage them by hand and potentially loose portability.

Back to talking about Alpine. Some official images on the Docker hub are starting to see some Alpine variant. Python for example still has the regular official image based on buildpack-deps:jessie but also has an Alpine variant. The difference in size is huge:

  • Python:2.7 :  676 MB
  • Python:2.7-alpine: 72 MB

But less than 2x compared with the optimized Debian Wheezy with Python in it. 2x is big for sure but much better than 10x. Interestingly, the Python Alpine based image compiles Python from source and does not get Python from the apk repository.

This means that the decision to use Alpine or not will become more of a tradeoff than a clear cut answer. You will have to decide whether you really need to save the space and use what is still for now a non-standard linux distribution or opt for a well-known distribution, gain from the package repositories that you know and trust but loose a bit of space.

Personally, I will use Alpine for quick prototyping and testing if I cannot use SCRATCH easily, and keep on using CentOS or Debian for production to:

  • Avoid fighting with re-building my libraries with musl-libc
  • Using/learning a new package manager
  • Benefiting from quick security updates of the traditional distros

What do you think ?