Deploying software with Docker containers

Fabian Gringel

February 26th 2021

Here I will a give a brief introduction to using Docker containers for software deployment. I assume that we want to deploy a Python application myapp that has a web API.

Overview

A Docker container is essentially a virtual machine intended to run a single application. They provide a way to isolate the runtime dependencies of the app from the host system level and therefore simplify deployments. Unlike conventional virtual machines, Docker containers run directly on the host kernel, relying on Linux' cgroups to provide process isolation. This has a couple of important implications:

Containers have much less overhead compared to normal virtual machines (VMs), running essentially as fast as native processes (on Linux).
Docker can only really run on Linux since it relies on specific kernel features. There is Docker for Windows and macOS, but there, Docker runs on top of a Linux VM, with the performance penalties that entails.
Containers are less isolated from the host system than VMs and they can not be relied upon for security.

To run a Docker container, we need an image as a starting point, which packages the base distribution as well as all dependencies we need.

Docker provides a declarative way to specify and build such images using Dockerfiles, which are similar to shell scripts or make files, although with a few important caveats, as we will see below.

Permissions

The Docker daemon requires either root permissions, or that the user is a member of the docker group, which is essentially equivalent to root and should be avoided. Non-root invocations will fail with the following, somewhat opaque, error message:

docker ps

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json": dial unix /var/run/docker.sock: connect: permission denied

Writing Dockerfiles

A Dockerfile consists of a set of instructions to setup the environment followed by the the command we want Docker to run at startup.

On our local machines, we would do the following:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Start the server on localhost:8080
myapp api --host 0.0.0.0 --port 8080 --storage ./storage

We want to create a Docker image which gives us essentially the same results.

A first attempt

Translating the above shell script into Docker instructions is mostly straightforward:

FROM python:3.7
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]

The steps are almost the same:

We start from the base image python:3.7, which is provided by Docker. This is basically a Debian image with a global installation of Python 3.7.
We create and change into the /app directory. The following commands are interpreted relative to /app.
We copy our code (and everything else in our build directory, see below) into the image.
We install our dependencies as above, using the RUN command. Note that we don’t need to use a virtual environment since we are already using Docker (but see below for a reason we might still want to use one).
We specify the command Docker is supposed to run when starting the container with CMD.

Avoiding common pitfalls

As often happens with Docker, the above straightforward approach has quite a few caveats.

The base image python:3.7 is quite large (~870Mb) and contains a lot of stuff we might not need. We can use python:3.7-slim instead, which is much smaller (~155Mb).
The above approach does not take advantage of the layer cache mechanism. Since we run COPY . . before the time-consuming RUN pip install ..., the cache gets invalidated after even minor code changes.
Our app runs as the root user inside the container, which is a security liability (although quite common).

The following Dockerfile is quite a bit closer to following best practices.

FROM python:3.7-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN useradd appuser -m
USER appuser
CMD ["myapp", "api", "--host", "0.0.0.0", "--port", "8080"]

This is quite a bit messier, but addresses the above points:

We use python:3.7-slim as our base image.
We split off the requirements file into a separate COPY instruction and only copy the rest after installing the dependencies. Now we can freely change the code, and rebuilding the image will be almost instant (as long as we don’t touch the requirements).
We are creating a non-root appuser and start the app with it.

The layer system and cache

Docker uses a union filesystem to build the image in layers. Each step in the Dockerfile creates a new layer, which

computes the results of the step,
computes a hash of it,
stores them in the build cache,
applies the changes on top of the previous steps.

This allows Docker to avoid repeating computations and makes the image storage more efficient. If a layer and all its predecessors were cached, than Docker will use the cache and the corresponding step in the Dockerfile can be performed much faster.

The hashes are computed as follows:

For COPY and ADD instructions, the hashes are computed from the copied files. Therefore, any changes to these files invalidate the cache.
For all other instructions, the hashes are computed from the commands in the Dockerfile.

Building the image

We can now build our improved image. This might take some minutes the first time it is run if there are a lot of dependencies. But subsequent builds should only take a second (until we change the requirements again).

sudo docker build -f Dockerfile -t myapp-api:slim .

Note that the . at the end is the docker build context and should refer to the project root.

docker build context

The Docker context consists of all the files which might be copied into the image with a COPY instruction. By default, these are all the files in the build directory or all its subdirectory, which is usually quite large.

This slows down docker build, since the context gets computed every time, even if we only copy specific files. Additionally, using COPY . . as we did above, leads to a lot of unnecessary and possibly sensitive files being included in the image.

To avoid a large build context, we can white- or blacklist certain files or directories in the .dockerignore file. This is analogous to .gitignore, although they are implemented differently and we can not just use a single file for both.

Running the container

We can execute the command specified in the Dockerfile as follows:

docker run -p 8080:8080 --name myapp-container -d myapp-api:slim

The command line options do the following:

The -d flag starts the container in the background.
The --name option gives the container a name to make it easier to refer to it later.
The option -p 8080:8080 forwards the port 8080 inside the container to our local network

We can check if our container is running with docker ps:

sudo docker ps

Debugging a container

We can execute other commands inside a running container:

sudo docker exec myapp-container whoami

appuser

This is very useful for debugging. We can e.g. check if our tests run for the image build:

sudo docker exec myapp-container pytest /app/tests/api -p "no:cacheprovider"

We can also login to the container and execute any commands there (but only as the appuser). This needs the -i (interactive) and -t (tty) switches:

sudo docker exec -it myapp-container bash

Conclusion

I hope the above walkthrough helps you to get started. Of course I could only cover Docker's core functionality here.

Be aware that depending on your app's specifications and the security standard your app should meet, adding a non-root user might not suffice.