We are pleased to present our new Open Source project. This time, we have created a tiny tool that is perfectly useful for most Kubernetes installations. So, what is the deal? K8s-image-availability-exporter (or k8s-iae for short) is a Prometheus exporter that warns you proactively about images that are defined in Kubernetes objects (e.g., an
image field in the Deployment) but are not available in the container registry (such as Docker Registry, etc.).
Kubernetes is an extremely dynamic system. When operating the infrastructure in the K8s cluster, we always assume that any pod (or even a node!) might be deleted at any moment. To improve resilience, we are testing the system using various chaos engineering approaches. Mainly, we randomly kill Kubernetes nodes to see whether our applications are ready for pod restarts.
For the application to work correctly in such dynamic Kubernetes realities, it has to follow some basic rules such as creating PodDisruptionBudgets, deploying several replicas of the application simultaneously, correctly configuring podAffinity, nodeAffinity, and so on.
However, despite such obvious rules, we cannot make all our customers apply them at all times. In real life, we often face various difficulties/peculiarities… such as a customer’s application that runs as a single replica. It’s been running like that for months with hardly any redeployments. The developers carry out these rare redeployments all by themselves and without involving werf (this tool can do that automatically). At the same time, the registry is configured to automatically delete all old images. One day you would need to restart a pod, so you will end up in a disaster.
Recently, we have encountered such a case. That was a common rescheduling operation in the Kubernetes cluster that caused an hour-long downtime while we were looking for a person who could build the application. Frustrated by the situation, we’ve decided to make k8s-image-availability-exporter. The idea is to automate the needed checks to prevent the above situations from happening regardless of compliance with organizational policies and the existence of other “random” factors.
Its general algorithm is as follows:
- Five informers are launched. Thus, we can keep [in RAM] a copy of the state of all the following cluster objects: Deployment, StatefulSet, DaemonSet, CronJobs, and Secrets.
- All container images used in PodTemplates are grouped and placed into the priority queue.
- Every 15 seconds (this interval is configured with the
--check-periodoption) we take out of the priority queue another batch of images* that haven’t been checked for a long time. We check whether they are available in our container registry.
- If there is an
imagePullSecretsparameter in the
PodSpec, we get the credentials from corresponding Secrets.
- Having these Secrets (in case of
imagePullSecrets) or without them, we connect to the container registry and check whether the required image exists.
* The number of these images is dynamically adjusted so that all cluster’s images will be checked within 10 minutes.
As a result, these types of metrics are being exported (
TYPE will be
k8s_image_availability_exporter_TYPE_available— non-zero indicates successful image check;
k8s_image_availability_exporter_TYPE_bad_image_format— non-zero indicates incorrect
k8s_image_availability_exporter_TYPE_absent— non-zero indicates an image’s manifest absence from container registry;
k8s_image_availability_exporter_TYPE_registry_unavailable— non-zero indicates general registry unavailability, perhaps, due to network outage;
k8s_image_availability_exporter_deployment_registry_v1_api_not_supported— non-zero indicates v1 Docker Registry API, these images are best ignored with
k8s_image_availability_exporter_TYPE_authentication_failure— non-zero indicates authentication error while connecting to your container registry, verifying
k8s_image_availability_exporter_TYPE_authorization_failure— non-zero indicates authorization error while connecting to your container registry, verifying
k8s_image_availability_exporter_TYPE_unknown_error— non-zero indicates an error that failed to be classified, to get additional information consulting exporter’s logs is recommended;
k8s_image_availability_exporter_completed_rechecks_total— increases with each cycle of checking for images in the registry.
The final touch: k8s-iae has a customizable list of images that should be ignored during monitoring (
The code is written in Go, distributed under the Apache 2.0 license, and is available on GitHub. Any suggestions for improving it or remarks regarding its use are welcome!
We have already adopted k8s-image-availability-exporter on dozens of Kubernetes clusters and are very pleased with the results. However note that k8s-iae currently has an alpha development status, so use it at your own risk.
There are three steps:
git clone https://github.com/flant/k8s-image-availability-exporter.git
kubectl apply -f deploy/
3. Configure integration with Prometheus: scraping and alerting rules.
More information (including examples of ready-to-use Prometheus configurations) you may find in the README.
k8s-image-availability-exporter allows the user to receive alerts when container images related to running Kubernetes controllers are missing in the registry. This helps you to solve the problem before it manifests itself.
The alternative approaches include:
- simply abandoning cleaning the registry;
- using tools that solve this problem “automagically” (as is the case with the werf GitOps CLI tool).