Blog
3 September 2021
Alexey Igrychev, software developer

werf vs. Docker: What is the difference when it comes to building images?

This article is the next installment in the “werf vs. …” series. In the previous article, we discussed the ways that werf differs from Helm. This article compares werf with an even more basic tool: Docker.

Our followers often wonder what the point is of building images using werf when Docker and its Dockerfiles are available. And they do their job pretty well, don’t they? Our typical response is that werf is about more than just building images. It is involved in the complete CI/CD cycle for delivering applications to Kubernetes. Docker is also used in werf as an auxiliary tool. Obviously, this explanation is too basic. It calls for a dive into greater detail.

This article is mainly intended for those who know little or nothing about werf but have had some experience with Docker. First, as we did in the case with Helm, we will try to figure out if comparing these two solutions is worthwhile.

Overall CI/CD picture

Let’s start with the general plan and compare Docker with werf in the context of the CI/CD pipeline and deploying applications to Kubernetes. This one is simple:

Function Docker werf
Building an application (Docker) image + +
Pushing images to the container registry + +
Cleaning up outdated images in the container registry based on pre-configured policies +
Deploying to Kubernetes +

Docker is mature, widely accepted, but still a basic solution. It provides ample opportunities for working directly with images at a low level, including building, tagging, and running containers; pulling from the container registry, and pushing to it.

werf is a high-level tool. The werf user deals with the application (as well as components of it, if necessary) instead of images and containers, with all the low-level mechanics left under the hood. While Docker has hundreds of mechanisms for working with images and containers, werf provides only a dozen. The reason for that is it handles all the remaining mechanisms by itself.

werf is not just an alternative to Docker, its main purpose is to deliver applications to Kubernetes. Thus, it uses Docker as one of its components and “glues” it to Git, Helm, and K8s. werf facilitates the usage of CI/CD pipelines created based on these standard tools and the CI system that the user chooses.

What werf can do (and Docker can not)

By using Docker to build images and work with them, werf also introduces new features into the build process and lifts some tasks off the user’s shoulders. For example, users do not have to worry about tagging images and cleaning up the container registry. Here is how these two compare on a per-feature basis:

Feature Docker werf
Image building using Dockerfiles + +
Image building using Stapel (a native werf syntax) +
Parallel builds + +
Distributed builds +
Debugging built images +
Tagging images with custom tags +
Automatic image tagging +
Image publishing +
Automatic image publishing +
Images running + +
Assembly artifacts clean-up on the host + +
Automatic assembly artifacts clean-up on the host +
Container registry clean-up +
Full toolset for managing images and containers +
Support for Giterminism (our approach to GitOps) +

How werf uses Docker

Containerization has long been an integral part of application development. This virtualization technology was introduced long before Docker. Docker’s contribution is different:

  • it defined the standard for containerization; in other words, the way applications should be packed into containers, in what order, and with what tools;
  • Docker simplified the UX, that is, the way the user works with containers.

Docker is based on a client-server architecture. The Docker engine combines a client (software for working directly with Docker) and a host (server). The daemon running on the host manages Docker objects, including images and containers.

werf acts as an alternative Docker client. It uses the Docker Engine SDK to interact with the Docker daemon via the API.

The Docker daemon runs on a local or remote host. werf can work with the Docker daemon both locally and remotely over a TCP socket in the event a Dockerfile is used for building.

In the case of the Stapel builder, the host’s service directories need to be mounted to assembly containers for werf to work; thus, the remote mode is not supported yet.

A detailed comparison of werf and Docker

1. Dockerfile and the alternative Stapel syntax

Docker supports a single format called Dockerfile. It has already become the standard for all build tools.

werf can also use Dockerfiles for building. The easiest way to get started at using werf is to use an existing Dockerfile for your project (if you have one). If you wish to use a Dockerfile to build an image, all you need to do is add a corresponding entry to the werf.yaml configuration file. Here is an example:

project: my-project
configVersion: 1
---
image: example
dockerfile: Dockerfile

The Dockerfile syntax, when used with werf, is the same as the regular one. Below is an example of a Dockerfile for building a basic Node.js application:

FROM node:14-stretch
WORKDIR /app

RUN apt update
RUN apt install -y tzdata locales

COPY package*.json .
RUN npm ci

COPY . .

CMD ["node","/app/app.js"]

In addition to Dockerfiles, werf supports its own Stapel syntax.

In the Stapel syntax, assembly instructions are contained in werf.yaml instead of a stand-alone file. Here is an example that does the same thing as the one above. However, this time we will use the Stapel syntax:

image: example
from: node:14-stretch
git:
- add: /
  to: /app
  stageDependencies:
    setup: 
    - package*.json
shell:
  beforeInstall: 
  - apt update
  - apt install -y tzdata locales
  setup: 
  - npm ci
docker:
  WORKDIR: /app
  CMD: "['node','/app/app.js']"

The main advantage (and the key feature) of the Stapel builder is integration with Git and the way it processes source code. These two features significantly reduce the time required for incremental builds.

The Dockerfile format involves the use of COPY and ADD instructions for working with files. The image gets rebuilt each time the files added to it are changed. Thus, you must use these directives with care and combine specific files with assembly instructions. For example:

COPY package*.json .
RUN npm ci

But what if the assembly instructions involve all the files while the rebuild has to be done only when specific files are changed?

Dockerfile offers no solution to that problem. Consequently, all the assembly instructions are repeated over and over again in the event of any changes.

In the Stapel builder, however, the step of adding source files to the image is much more flexible. The user can define dependencies that trigger the assembly instructions for the current files.

The following are Stapel’s key advantages:

  • Stapel’s assembly instructions are similar to those of Dockerfile but more flexible and advanced.
  • While Dockerfiles only support the RUN instruction for executing commands in a shell, Stapel supports both shell commands and Ansible tasks.
  • Improved handling of build configurations through the use of YAML and templating.
  • A Stapel image can be built based on another Dockerfile or Stapel image defined in werf.yaml.
  • Stapel can use mounts to speed up builds and decrease the size of the resulting images.
  • The Stapel syntax provides an import directive that is similar to Docker multi-stage builds. The directive can import files from other Dockerfile or Stapel images declared in werf.yaml, representing another option to reduce the size of the resulting image and speed up build time by reusing existing images.
  • Stapel-syntax-based builds provide more efficient layer caching mechanics as well as some additional handy features.

2. Building output

The regular Docker builder shows a bare minimum of information. All the user can see is the assembly instructions and their output in the logs. The Docker builder reports neither how long it took for the whole process (or each instruction) to complete nor the size of the intermediate layers. As a result, if there is a large number of images (and the assembly instructions are too “verbose” and vague), users find it challenging to navigate among all that diversity: they look at the log and have no idea which assembly instruction/image the log refers to.

Docker output

The output becomes more meaningful if the BuildKit builder is used (for more information about BuildKit, see the “Parallel builds” section below). It provides additional information, such as total build time and the time spent on each step individually.

The assembly instructions can be used interactively throughout the build process: a certain number of lines is printed and after the instructions are completed, these lines are erased. The disadvantage is that you cannot work with the output after the build is complete. In most CI/CD systems, the terminal is non-interactive. Thus, the advantages associated with BuildKit’s output fade away.

Output of Docker and BuildKit

In werf, we focus on letting the user know what is currently taking place, what image is being built, and what instruction is being executed.

At the same time, the output must be informative and unclogged to facilitate analysis of the build history and troubleshooting problems. The output must clarify how the build time and the size of the resulting image are distributed over the build steps. That way, you can identify the steps that are to be optimized. For example, you can get rid of the unnecessary files in the resulting image or reorder the instructions and adjust their dependencies in order to optimize caching.

werf’s output

The screenshot above displays part of the dev image build output. As you can see, two stages are being built, beforeSetup and setup. Each user instruction is prefixed with the image name and stage. werf shows the total image build time, the time it takes for all of its stages to complete, as well as the time it takes to run a specific instruction. After the stage is complete, werf displays statistics on how much the size of the image has grown at that step.

3. Building, tagging, and publishing images

With Docker, there are at least two commands you have to invoke:

  1. docker build to build an image (accompanied by docker build --tag to add a tag to it during the build);
  2. docker push <tag> to push it to the container registry.

With werf, you can do all of the above in just one command: werf build.

The main differences between werf build and docker build

  • The werf build command sequentially builds, tags, and publishes each layer, while docker build only adds tags to the resulting image and pushes it to the registry.
  • In werf build, each stage that makes up the resulting image gets a specific name (content-based tagging) while docker build names only the end image. As a result, in the case of werf, all intermediate layers are listed in docker images, while in the case of Docker, they are not.
  • werf build can run in two different modes: local and distributed. In local mode, it produces a set of specifically named stages of the built images. In distributed mode, it creates the same set of stages and pushes them to the container registry.
  • As a result, there is no such thing as tag and push commands in werf since they are a part of the werf build command.

Why the werf build process works this way

The primary objective is to render assembly more user-friendly and efficient.

In werf, the user does not have to bother himself with image names. werf builds the current Git commit and fills the container registry with all missing layers. After the build is complete, the container registry is guaranteed to include all the necessary layers (that is, they are built and pushed to the registry) for the images specified in werf.yaml.

werf automatically adds tags to Docker images based on their contents — the so-called content-based tagging. Such a tagging strategy is immune to empty commits and commits that do not change the files used in the Docker image. If the build starts on the old Git commit, werf will notice that and leave the original image intact (thus, no application restart is required). This strategy prevents unnecessary re-deployments and downtime.

The resulting images and intermediate werf layers are always immutable. This ensures that a previously published layer or image will not overlap with another layer in the future. In Docker, however, the user chooses the names for the end images, so there is no such guarantee.

werf saves time by only building the images (image layers) required for the current commit that are not present in the container registry. Plus, it saves space since you have no need to store the unnecessary images in the registry.

4. Build context and Giterminism

After the docker build is invoked, the current working directory acts as the build context. All the files in that directory can be added to the image being built via the COPY and ADD Dockerfile instructions.

In werf’s case, only the build context files from the current commit in the project repository are used. This is the essence of the werf’s Giterminism mode (based on the words Git and determinism) — a more advanced implementation of GitOps. (As a reminder, the GitOps approach assumes that the current infrastructure state is stored in Git while the build and deploy processes are reproducible.) werf forces versioning for all things related to building and deploying an application as well as expecting that the entire configuration related to these two processes is stored in Git.

If a Dockerfile is used to build an image, werf prepares the archive with the build context. In the case of a Stapel image, werf mounts archives or patches to the build container (depending on the build stage and the user configuration).

werf makes configurations easy to reproduce and allows the developers to use images built by the CI system on Windows, macOS, and Linux. To reproduce a specific configuration, all you need to do is switch to the corresponding commit — and the result will be the same everywhere.

5. Automatic tagging

The werf user needs not to worry about tags since you never have to deal with them. The image name from werf.yaml is all you need to work with among the application components. With it, you can run a command for a specific component or refer to it in Helm templates.

werf build <IMAGE_NAME_FROM_WERF_YAML>
werf run <IMAGE_NAME_FROM_WERF_YAML>

During processing, werf inserts a set of service values (including the names of Docker images) into the Helm templates. The user has the ability to use them. Here is an example:

{{ .Values.werf.image.<IMAGE_NAME_FROM_WERF_YAML> }}

6. Running images

Docker provides docker run and docker compose commands for running images. The first one is suitable for launching a specific image while the second one is suitable for starting and running the entire application in Docker.

With werf, all you need to do is specify the application component’s name as it appears in werf.yaml to run it: werf run <IMAGE_NAME_FROM_WERF_YAML>.

You can also use images built by werf in Docker Compose configurations. To do so, you can use the environment variables reserved for images as well as the werf commands: werf compose config|down|up|run.

version: '3.8'
services:
  web:
    image: ${WERF_APP_DOCKER_IMAGE_NAME}
    ports:
      - published: 5000
        target: 5000

Docker Compose is a powerful tool that is not limited to local development and testing. You can use it in any environment as long as it meets your needs and expectations.

This begs the following questions: What tool should I use, Docker Compose or Kubernetes? What are the pros and cons of these environments? Why is werf focused primarily on Kubernetes? The answers to the above questions are beyond the scope of this piece and merit a separate article.

7. Parallel builds

Parallel image assembly allows you to significantly speed up the building process. The independent images are assembled simultaneously; different build stages are also run simultaneously.

Docker supports parallel builds with the BuildKit feature enabled. Docker itself can only build one image at a time in a single docker build invocation. BuildKit, meanwhile, enables the parallel building of related images (multi-stage).

In werf, you can explicitly enable BuildKit (DOCKER_BUILDKIT=1). However, unlike Docker, werf can simultaneously build all the application images itself. For example, you can build:

  • any number of targets using one or multiple Dockerfiles;
  • any number of Stapel images.

8. Distributed build

Here, the term “distributed” means that multiple builders can work cooperatively. Thanks to synchronization and locking mechanisms, they effectively reuse common layers and preserve the reproducibility of all of the builds.

werf builds a layer, checks whether it is unique, and pushes it to the container registry. During the build, werf scans the local storage or container registry for an existing stage. If one is found, werf uses the existing image instead of building a new one from scratch. The algorithm is somewhat similar to Docker caching but more complex and sophisticated. werf implements an MVCC (multi-version concurrency control) mechanism with optimistic locking.

9. Debugging images

werf provides a so-called stage introspection tool for debugging the build process. Using introspection, you can analyze specific stages of the build process, identify possible errors in assembly instructions, or discover the causes of unexpected results of the build.

Furthermore, during the build process, you can access a specific stage. During introspection, the container gets to a state that is identical to that of the real build. It has the same environment variables and utility tools that werf uses during assembly. In essence, introspection involves running an assembly container interactively to work within it.

You can think of it as a test run of some sort: first, you get the result in the build container, then you optimize the build configuration in werf.yaml. Introspection may come in handy in the development process when the need arises to play around with some assembly steps.

10. Cleaning up the host

Docker has commands that allow you to selectively remove images, containers, and other resource types: docker rm CONTAINER_ID [CONTAINER_ID ...], docker rmi IMAGE_ID [IMAGE_ID ...]. It also provides a universal command to clean up unused data: docker system prune.

On the other hand, werf has the host cleanup feature enabled by default. The cleanup is performed as part of the general commands (werf converge and werf build). werf cleans up temporary data, caches, and local Docker images. The algorithm takes into account the age of outdated files and the amount of disk space they occupy. When the threshold is exceeded, werf automatically deletes outdated files, starting with the oldest ones.

You can use the werf host cleanup command to clean up the host manually. The werf host purge command cleans up all werf-related data on the host.

11. Cleaning up the container registry

The container registry usually stores images that are used on a regular basis. Over time, however, the registry gets flooded with outdated, irrelevant images that eat up tens of megabytes (or even terabytes) of disk space. This can become a problem if the registry is hosted on AWS or a similar fee-based solution.

Docker, unlike werf, does not provide a container registry cleanup feature. You have to delete each irrelevant image manually or use the tools specific to the container registry itself (if any are available at all).

In werf, there is a werf cleanup command designed to run based on a schedule. The cleanup procedure is safe and conforms to established cleaning policies. During cleaning-ups, werf accounts for images used in the Kubernetes cluster, newly built images, and user-defined policies that define team workflows and are tied to Git (see this article and the documentation for details).

The complete cleanup of the container registry can only be carried out manually via the werf purge command.

Plans to optimize the build process in werf v1.3

All the features we discussed above apply to the current version of werf (v1.2). However, in the next version, we intend to make another stride in optimizing the build process.

  1. Docker does not support building within userspace (unlike, for example, Kaniko and Buildah). As a result, the current werf version does not support userspace builds either. Since this feature is growing in popularity, in werf v1.3, we plan to get rid of the Docker daemon and add the feature of building within userspace.
  2. Also, we plan to implement custom pipelines for werf stages:
  • treat each Dockerfile instruction as a separate werf stage instead of using a single stage for all the instructions;
  • allow the user to choose stages freely for the Stapel-based assembly (currently, the Stapel builder can only use a fixed set of stages).
  1. We intend to build upon Dockerfile capabilities while preserving compatibility with Docker. First and foremost, we plan to expand integration with Git and render it similar to that of the Stapel builder.
  2. We plan to integrate the features implemented in the Stapel builder into Dockerfile-based builds:
  • building a Dockerfile image based on another Dockerfile/Stapel image defined in werf.yaml;
  • importing artifacts from other Dockerfile/Stapel images defined in werf.yaml for multi-stage support between multiple Dockerfiles and integration with Stapel images.

Summary

A direct comparison of Docker and werf (as in the case of Helm vs. werf) is apples and oranges since werf supports the entire cycle of application delivery to Kubernetes. Image assembly is integrated into this process and automated as much as possible.

Docker is an established builder tool with a wide range of capabilities for working with images and containers, and it does its job good. werf bases its build process on Docker while making improvements along the way and integrating build results with other steps in the CI/CD pipeline, such as running images, deploying the application, and cleaning up the container registry. Our tool saves a lot of time by eliminating low-level tasks.