Small additionnal precisions on the Dockerfile

Previously , in the presentation of Docker and container technology , we were talking about the features on which Docker is based upon. The CGroup and Namespaces for process isolation, and COW file system to optimize disk space. In this article we will scrutinise the Dockerfile, the description file of an image and a Docker container. Even if the number of commands is limited, their subtlety may not appear at the first use. The official website is the best source of documentation. Here we will focus only on the basic commands - ADD , COPY, CMD and ENTRYPOINT .

The examples have been used with version 1.11 of Docker. Changes will be made where necessary.

Undestanding layers :

Each command of the Dockerfile (ADD , COPY , ENTRYPOINT , CMD , …) adds a new layer (COW) to the image. And the less we have layer the better it is for the image size. Considering that, the number of layers has a limit (it used to be 42 and since recently it’s 127). Even if the limiting factor of the number of rows (or layers) is almost no longer present ( if you have a Dockerfile with 127 lines there is a real problem ), combining commands facilitate the understanding of the Dockerfile .

Tip: Combine commands as far as possible

Example 1. Dockerfile (with 4 layers)

RUN curl -fsSL http://archive.apache.org/dist/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz // (1)
RUN tar xzf apache-maven-$MAVEN_VERSION-bin.tar.gz  - -C /usr/share //(2)
RUN mv /usr/share/apache-maven-$MAVEN_VERSION /usr/share/maven //(3)
RUN ln -s /usr/share/maven/bin/mvn /usr/bin/mvn //(4)

Example 2. "Refactored" Dockerfile in only one command

RUN curl -fsSL http://archive.apache.org/dist/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz | tar xzf - -C /usr/share \
  && mv /usr/share/apache-maven-$MAVEN_VERSION /usr/share/maven \
  && ln -s /usr/share/maven/bin/mvn /usr/bin/mvn //(1)

The number of layers is reduced to one and we have gained more readability.

The cache

All the layers of an image are cached. When launching the build of an image, Docker starts first by checking in the cache layers that could be reuse, in order to optimize the build time of the image, by avoiding unnecessary downloads. Therefore, in case of any modification, only the lines that are affected by the changes will be reinterpreted .

FROM java:openjdk-8-jdk  //(1)
MAINTAINER Yakhya DABO  //(2)
ENV MAVEN_VERSION 3.3.3  //(3)
ENV PROJECT_DIR /usr/src/app  //(4)
ENV M2_HOME /usr/share/maven  //(5)
...

...
RUN mkdir -p $PROJECT_DIR //(8)
COPY config/settings.xml $M2_HOME/conf/ //(9)
RUN curl -fsSL http://archive.apache.org/dist/maven/maven-3/$MAVEN_VERSION/binaries/apache-maven-$MAVEN_VERSION-bin.tar.gz | tar xzf - -C /usr/share \
&& mv /usr/share/apache-maven-$MAVEN_VERSION /usr/share/maven \
&& ln -s /usr/share/maven/bin/mvn /usr/bin/mvn  //(10)
VOLUME $PROJECT_DIR  //(11)
WORKDIR $PROJECT_DIR  //(12)

If we make a change on the line 9, the cache will invalidate all its descendants, lines 10, 11 and 12 which will be replayed. Lines 1,2 … 8 not being descendants of line 9 will not be affected by the changes, the cache will be used .

Two tips to make the most of the cache :

Place the lowest possible lines that often change (Adding a jar for example with COPY/ADD)
Put expensive operations (downloading for example with RUN curl ) as high as possible in the Dockerfile id order to avoid them being replayed for any change.

ADD and COPY commands

These two commands are easily subject to confusion. By their names and by their syntax they seem to do the same thing, except that in practice this is not always the case.

 ADD	src	dest
 COPY	src	dest

src : a folder (or a file) of the host dest : a folder (or a file) of the container

Small point: In the "docker build" command we specify the context of the Dockerfile

 $ docker build -f dockerfileDir contextDir

 $ docker build contextDir  // (if contextDir = buildDir)

src is relative to contextDir.

dest is either a filename (/var/opt/fileName) or a folder’s name (/var/opt/).

COPY just simply copies a file (or directory ) from the host and put it in the container.

ADD can also do the same thing, but here src can be an URL, in which case Docker will download the file under dest. If src is an archived file and dest a folder Docker will extract src under dest (ADD file.zip /var/opt/ ). All of this is very confusing …

The Docker documentation recommends using COPY in favor of ADD, except for specific cases. But if you have immersed in the Unix philosophy (Do one and only one thing) or the SRP principles you will see that ADD is a legacy , therefore it’s useless. It’s better not to have it in your Dockerfile .

It is sufficient to use COPY for simple copy operations, from the host to the container, and RUN with existing tools such as tar, unzip, wget, curl, … if we want to archive or download files.

CMD and ENTRYPOINT commands

In practice these two commands can have the same result ; to execute the startup script of the container. But according to the documentation, Entrypoint is used to configure a container at startup, while CMD is used to provide the default startup command to the container.

Example 3. Usage of CMD

FROM maven:3.3.3-jdk-8
WORKDIR projectDir
...
CMD ["mvn clean install”]

$ docker run my_maven_image va executer mvn clean install

if we want to override the commad

$ docker run my_maven_image mvn clean verify

Example 4. Usage of ENTRYPOINT

For my git container I will need the parameters of the commiter (user.name and user.email) at the starup of the container. I can therefore use ENTRYPOINT to set these arguments.

FROM git:2.0
….
COPY entrypoint.sh /var/lib
ENTRYPOINT [“/var/lib/entrypoint.sh”]

$ docker run -e GIT_USER_NAME=username -e GIT_USER_EMAIL=email my_git_image git commit -m “xxxxx”

The content of my ENTRYPOINT

 #!/bin/bash

 set -e

 git config user.name "$GIT_USER_NAME"
 git config user.email "$GIT_USER_EMAIL"

 exec "$@"

It’s important to note that ENTRYPOINT use CMD as its argument ("$@"). It’s default value is /bin/sh -c, which has a command as a paramter. So, when we don’t define any ENTRYPOINT (in the Dockerfile or as a parameter of docker run with --entrypoint) CMD becomes the command to run (possibly with its parameters).

Example 5. With New Relic

I use New Relic in Production to monitor the JVM of my container. But I also want to have the option to do without it, in Dev envrionment for example.

The simplest solution is to have two different images, one for Production, with New Relic, and the second for Dev, without New Relic. But this option does not meet the principles of Continuous Delivery, _"The artifact should remain the same in all environments".

A second solution, which is my favorite, would be to use ENTRYPOINT in deciding whether or not to run New Relic, depending on whether or not NEWRELIC_KEY and NEWRELIC_APP_NAME environment variables are set.

To launch the container in Production :

 $ docker run -e  NEWRELIC_KEY=XXXXXXXX -e  NEWRELIC_APP_NAME=my_app_name my_service_image

… and in Dev :

 $ docker run -e my_service_image

In my ENTRYPOINT I can have the init script of the execution environment, to set the parameters of config files with environment variables given as parameters (name, key, url, password, login, …) and use CMD to specify the command to execute after initialising the environment.

Dockerfile
….
ENTRYPOINT [“entrypoint.sh”]
CMD ["java","-javaagent:/opt/newrelic/newrelic.jar","-jar","app.jar"]

entrypoint.sh

#!/bin/sh

set -e


if [ -z "$NEWRELIC_KEY" ]; then
        java -Djava.security.egd=file:/dev/./urandom -jar app.jar
else
        if [ -z "$NEWRELIC_APP_NAME" ]; then
                echo >&2 'error: missing required environment variable'
                echo >&2 'error: NEWRELIC_APP_NAME must be set when using New Relic'
                exit 1
        fi

        NEW_RELIC_CONFIG_FILE=$NEW_RELIC_DIR/newrelic.yml
        cp $NEW_RELIC_CONFIG_FILE $NEW_RELIC_CONFIG_FILE.original

        # Override key and app_name
        sed -i -e "s/app_name:\ My\ Application/app_name:\ ${NEWRELIC_APP_NAME}/g" $NEW_RELIC_CONFIG_FILE
        sed -i -e "s/'<\%= license_key \%>'/${NEWRELIC_KEY}/g" $NEW_RELIC_CONFIG_FILE

        exec "$@"
fi

The key point …

It’s very important to undestand how the cache works in order to reduce the build time of the images, which is an essential constraint in Continuous Delivery.

COPY command should always be used in favor of ADD.

Limit yourself to CMD for simple commands, with no need of any configuration from the container, and use ENTRYPOINT + CMD when you need to apply configurations to the container before launching it.