Storage with Docker volumes

Storage with Docker Volumes

Between the removal and the creation of a container (with docker rm command) no data is saved. In Docker, containers are immutable. It has COW based file system, OverlayFS, which is in Read Only mode.

But in practice we have always the need to keep some states in an application, through a database or simples flat files. Docker provides three features (Data Volume, Data Container and Named Volume) to satisfy this requirement. The idea is to mount in the container files from the host (or from another container) so that the files remain after the removal of the container.

storage

The Data Volume

The idea is to mount a host directory in the container. It can be done by following either of the steps below :

  1. In the Dockerfile with VOLUME [“containerDir”] command : We define the container directoy where the host directory will be mounted.

  1. Specify the volume as we start the container : The docker run -v hostDir:containerDir command creates a volume in the container and mount host direcotry in the volume. If the hostDir does not exist Docker will create it with the root User.

Example 1. Postgres
$ docker run -v /var/data/:/etc/postgresql postgres

The main limitation of mounting a host folder is the risk of loing portability. You should ensure that the access rights (read/write) are the same on all hosts, as well as the filesystem.

The Data Volume Container (or Data Only Container)

To address the limitiation of data volumes, namely regarding portability, Docker standardised the Data Valome Container. The data will be saved in a container, instead of the host, and that the need to delete it would thereby be averted. The -v attribut will be replaced by –volume-from.

Example 2. Dockerfile of the DVC
  FROM alpine:latest
  RUN mkdir dataDir
  VOLUME dataDir

The DVC declares volumes and makes them available to other containers. The container is created using the command docker create data-volume-name (or by docker run -name data-volume-name …​ true, the true option tells Docker to create the container and to stop it just after).

Example 3. Creation of the DVC
$ docker create -name data-volume-name ubuntu
Example 4. Mounting the DVC
$ docker run -name my-container-name –volumes-from data-volume-name postgres

This solution may sound great, until the first time we use it and face User Namespaces constraints. The User in the container that use volume may not have enough rights in the DVC. It even may not exist in the DVC. To solve this problem we must ensure that the same User exists in both containers.

Sol 1 : Create two users with the same ID in each of the two images.

Sol 2 : Use the same image (with a User) for the two containers. One of the benefits of sharing image is the disk space optimisation provided by the COW File System (As mentionned previously, running multiple containers from an image should not have impact on the disk space).

Example with Maven :

I have a container I use to build my project with Maven, and another container to persist the Maven repository, so that there will be no need to download dependencies for each build.
Example 5. Create DVC from maven image
$ docker create --name maven_local_repository maven:3.3.3-jdk-8
Example 6. Run build container from maven image
$ docker run -v my_project_dir:/usr/src/app \
		--volumes-from maven_local_repository \
		maven:3.3.3-jdk-8 mvn clean install

The Named volumes

Both patterns we just saw (Data Volume and DVC) ressemlent look a lot more like hacks . The first writes on the host and the second in a dormant container. In addition neither one of them is accessible through the network, their use for persistence is valid only within a host.

In practice, the data of an application must be able to be regularly moved, backed up and restored on one or more hosts. This is one of the reasons behind the delay in the adoption of Docker (Prod by precisely ) for databases.

With release 1.8 appeared the Named Volumes, through the Volumes API, to meet the need of harmonising the use of volumes and to add more flexibility.

Example 7. Creation of a named volume
$ docker volume create --name maven-repo-volume
Example 8. Mounting named volume
$ docker run --rm  -v maven-repo-volume:/root/.m2/ maven:3.3.3-jdk-8 mvn clean install

We use -v option (or --volume) instead of –volumes-from.

Shared volume (Convoy and Flocker)

By default Named volumes will be available only on local host, but can be shared throught the network when configured with a dedicated driver. Docker supports volumes drivers since versions 1.8. And you will likely have to choose between the two most known implementations, Convoy from Rancher Labs and Flocker from ClusterHQ.

They function in the following way : We create a volume, make snapshots from the volume and save them somewhere in the cloud (AWS for example) so that the volume can be restored in any host. Therefore writing in the volume is like writing in the cloud.

Example 9. with Flocker
$ docker volume create -d flocker --name my-named-volume -o size=20GB

The key point …​

Volumes can be useful in several use cases, for persisting a Maven repository , saving log files or managing of environment variables. In many cases, such as Database or Maven repo, a Named volume will do the job. The DATA Volume may be a solution in the cases where the container’s parameter data, environment variables for example, are provided by the host.

After a long period of use of volumes you will inevitably face the issue of dangling volumes, ghosts volumes lurking on the host without being tied to any container. They can impact disk space. This issue can be solved by removing volumes, docker-compose down -v or docker volume rm <volume name>. But the major risk that you can be exposed is data corruption, it can happen when several containers share the same volume.

Docker definitely has a long way to go to reach the level of traditional persistance technologies.