Blog
17 April 2020
Timofey Kirillov, software developer

Content-based tagging and its implementation in the werf builder

werf is our Open Source GitOps tool to build your applications and deploy them to Kubernetes. The v1.1 release introduced a new feature in the image builder: the content-based tagging. Until now, the typical tagging strategy in werf involved tagging Docker images by a Git tag, Git branch, or Git commit. However, all these strategies have drawbacks that are fully resolved by implementing the new tagging strategy. In this article, we discuss its advantages.

Deploying multiple microservices from a single Git repository

Frequently, an application is divided into multiple more or less independent services. These services can be released individually with one or more of them being released at a time. Meanwhile, other services must continue to run as before. However, from the point of storing the source code and managing the project, it makes sense to move these application services into a single repository.

Of course, there are situations when services are truly independent and are not related to a single application. In this case, they will be stored in separate projects and released through unique CI/CD processes in each of the projects.

However, in real life, developers often divide a single application into several microservices. At the same time, they are reluctant to create a separate project/repository for each microservice. That is precisely the situation we will discuss in this article: several independent microservices are stored in a single project repository and combined into a common CI/CD process.

Setting tags based on Git branches and Git tags

Suppose we use the most common tagging strategy, tag-or-branch. In case of Git branches, images are tagged using the name of the branch; only one published image can exist for each branch at a time. In case of Git tags, images are tagged with the name of the tag.

Whenever a new Git tag emerges (for example, when a new version is released), a new Docker tag is created for all images of the project:

  • myregistry.org/myproject/frontend:v1.1.10
  • myregistry.org/myproject/myservice1:v1.1.10
  • myregistry.org/myproject/myservice2:v1.1.10
  • myregistry.org/myproject/myservice3:v1.1.10
  • myregistry.org/myproject/myservice4:v1.1.10
  • myregistry.org/myproject/myservice5:v1.1.10
  • myregistry.org/myproject/database:v1.1.10

These new image names infiltrate via Helm templates into the Kubernetes configuration. When initiating the deployment process with werf deploy, werf updates the image field in manifests of the Kubernetes resources. Then, the corresponding resources are restarted due to the changed image name.

However, if the contents of an image have not changed since the last time the Git tag was deployed, and the only thing that has been changed is the Docker tag, then an application is needlessly restarted. In this case, downtime is possible, although there was no reason for a restart.

As a result, under the current tagging strategy, you have to create several separate Git repositories. As a consequence, the problem of deploying using these repositories arises. Overall, the scheme turns out to be overly complicated. The preferred approach is to combine all services into a single repository and create Docker tags in such a way as to avoid unnecessary restarts.

Tagging by Git commit

werf also has a Git commit-based tagging strategy.

The Git commit serves as an identifier of the Git repository contents and depends on the history of edits in the repository, so it seems logical to use it for tagging images in the Docker Registry.

However, Git commit-based tagging scheme has the same weaknesses as Git branches/tags-based strategies:

  • There might be an empty commit that does not change the contents of an image; however, the Docker tag of an image would change.
  • The above is also true for a merge commit.
  • A commit might change only those files in Git that are not imported into an image; in this case, the Docker tag would still change.

Git branch-based tagging does not reflect the version of an image

There is another problem related to the Git branch-based tagging strategy.

Tagging by a branch name works fine as long as commits of this branch are kept in chronological order.

If under the current scheme, the user will run a rebuild of an old commit linked to some branch, werf will replace an image attached to the corresponding Docker tag with the newly built image for that old commit. From now on, deployments that use this tag are at the risk of pulling the wrong version of an image when restarting the pods. As a result, our application will not be synchronized with the CI system anymore.

Also, if there are consecutive pushes into the same branch at short intervals, an image for the newer commit might be built earlier than an image for the older commit. In this case, the older version of an image would overwrite the newer one. The full-fledged CI/CD system can solve the above problems (e. g., in case of a series of commits, GitLab CI starts the pipeline for the latest one). However, not all CI systems support that functionality. There has to be a more reliable way to prevent such a fundamental problem.

What is content-based tagging?

That’s how we come to the content-based tagging.

Under this strategy, werf uses a checksum instead of Git primitives (such as Git branches/tags) to create associated Docker tags. The checksum depends on:

  • image contents. The tag-identifier of the image reflects its contents. When building a new version, that identifier would stay the same if files in the image have not changed;
  • the history of this image in Git. Images associated with different Git branches that have different build history in werf will have different tags-identifiers.

The so-called stages-signature of an image acts as an identifier tag.

Each image consists of a set of stages: from, before-install, git-archive, install, imports-after-install, before-setup, … git-latest-patch, and so on. Each stage has an identifier which reflects its contents — the stage signature.

The final image (consisting of these stages) is tagged with the so-called stages-signature of a set of stages. This signature aggregates all stages of an image.

In general case, each image defined in the werf.yaml configuration file will have its own stages-signature and, accordingly, a Docker tag.

The stages-signature solves all the problems described above:

  • It is resistant to empty Git commits.
  • It is unaffected by Git commits that change files that are non-relevant to the image.
  • It is not subject to the problem of overwriting the current version of the image when restarting builds for old Git commits of the branch.

Currently, it is a recommended tagging strategy, werf uses it by default for all CI systems.

How to enable and use it in werf

The werf publish command now has the corresponding option: --tag-by-stages-signature=true|false

In the CI system, the tagging strategy is set by the werf ci-env command. It has previously had the werf ci-env --tagging-strategy=tag-or-branch parameter. Now you can specify werf ci-env --tagging-strategy=stages-signature or even omit this parameter — werf would still use the stages-signature strategy by default. The werf ci-env command will automatically set the appropriate flags for the werf build-and-publish (and werf publish) command, so you don’t need to specify any additional options for these commands.

For example, the command:

werf publish --stages-storage :local --images-repo registry.hello.com/web/core/system --tag-by-stages-signature

… can create the following images:

  • registry.hello.com/web/core/system/backend:4ef339f84ca22247f01fb335bb19f46c4434014d8daa3d5d6f0e386d
  • registry.hello.com/web/core/system/frontend:f44206457e0a4c8a54655543f749799d10a9fe945896dab1c16996c6

Here, 4ef339… is the signature of stages of the backend image, and f44206… is the signature of stages of the frontend image.

Thanks to werf_container_image and werf_container_env, you do not have to make any changes to Helm templates: these functions would automatically generate the correct image names.

Example of the configuration in a CI system:

type multiwerf && source <(multiwerf use 1.1 beta)
type werf && source <(werf ci-env gitlab)
werf build-and-publish|deploy

The additional information is available in werf documentation:

Conclusion

Here is a summary of our efforts:

  • The new werf publish --tag-by-stages-signature=true|false parameter.
  • The new value for the werf ci-env parameter ( --tagging-strategy=stages-signature|tag-or-branch). By default, the stages-signature strategy is used.
  • If you have previously used the Git commit tagging options (WERF_TAG_GIT_COMMIT or the werf publish --tag-git-commit COMMIT option), you have to switch to the stages-signature tagging strategy.
  • It is better to use the new tagging scheme for new projects since the very beginning.
  • You can switch the existing projects to the new tagging strategy when updating to werf 1.1. At the same time, the former tag-or-branch approach is supported as well.

The content-based tagging strategy solves all the problems mentioned in this article. It ensures:

  • The immunity of the name of the Docker tag to empty Git commits.
  • The immunity of the name of the Docker tag to Git commits that make changes to non-relevant files.
  • The alleviation of the problem of overwriting the current version of the image when restarting builds for old Git commits for Git branches.

Enjoy it! And do not forget to visit our GitHub repository. There, you can fire up an issue or find an existing one, up-vote it, create a PR, or take a closer look at the project.

Afterword

This article has been originally posted on Medium. New texts from our engineers are placed here, on blog.flant.com. Please follow our Twitter or subscribe below to get last updates!