Announcing addon-operator to simplify managing additional components in K8s clusters
Following our recent introduction of shell-operator, we are happy to announce its older brother — addon-operator. It is an Open Source project that helps you to install system components (also known as addons) into Kubernetes clusters, configure them and keep up to date.
Do we really need addons?
As you know, Kubernetes itself is not a complete, one-fits-all solution. To build a serious cluster, you have to rely on various additional components:
- For simple K8s installations used for learning or common experiments only, you might be satisfied with basic components available out of the box.
- For developing and testing purposes, you might want to add Ingress.
- Yet complex, production-ready environments will require much more… maybe dozens of different addons including monitoring & logging tools, cert-manager, specific addons for node allocation, defining network policies, sysctl configuring, pod autoscaling, etc…
What is so special about addons?
In most cases, it’s not enough to make a basic installation of addons. You have to update and disable (delete from the cluster) them, and sometimes you even want to test them before installing into the production cluster.
Aren’t systems like Ansible well suited for such tasks? Well, maybe… However, full-fledged addons generally cannot exist without settings. These settings may differ depending on the cluster type: aws, gce, azure, bare-metal, do… Moreover, some of them can’t be set in advance since they are derived from the cluster itself. And the cluster is not static: you have to continuously monitor the changes which are important for some settings. So Ansible is not enough in this case: you need a program that operates within the cluster — that is a Kubernetes Operator.
If you have tried our shell-operator, you know that installing and updating addons as well as monitoring of settings can be implemented via shell-operator hooks. For example, you can create a script that executes kubectl apply commands and monitors the ConfigMap where settings are stored. It is quite similar to what is implemented in addon-operator.
What is actually implemented in addon-operator?
Creating our solution, we have been following these important principles:
- The addon installer must support templating and declarative configuration. “Magic” scripts that install addons are entirely abandoned. Addon-operator uses Helm to install addons. All you need is to create the chart and select values that would be used for configuring.
- Settings can be generated during installation, they can be discovered from the cluster. They can reflect changes in cluster’s resources. This functionality can be implemented via hooks.
- Settings can be stored within the cluster. For this purpose, the ConfigMap/addon-operator object is created and addon-operator monitors changes made in that ConfigMap. Addon-operator makes these settings available in hooks via simple conventions.
- Addon depends on the settings. If settings have changed, addon-operator rolls out the Helm chart having new values. The combination of the Helm chart, its values, and hooks is called the module (details are provided below).
- Staging. There is no “magic” release scripts. The update mechanism is similar to the ordinary application. You need to build an image having addons and addon-operator binary, tag it and deploy into the cluster.
- Monitoring. Addon-operator implements
/metricsendpoint to expose metrics for Prometheus.
What is an addon in addon-operator?
The addon is any code that adds new functionality to K8s cluster. For example, Ingress installation is a perfect example of an addon. Any operator or controller that has its own CRD — e.g. prometheus-operator, cert-manager, kube-controller-manager — may be considered an addon. Lightweight scripts simplifying routines — e.g. copying registry secrets to new namespaces or fine-tuning sysctl parameters for new nodes — are also considered addons.
Addon-operator provides several concepts for implementing addons:
- Helm chart is used for installing various software into the cluster, e.g. Prometheus, Grafana, nginx-ingress. If there is a Helm chart for any desired component, it can be easily installed with addon-operator.
- Values storage. Helm charts usually have many settings that can change over time. Addon-operator maintains the storage of these settings and monitors their changes in order to re-install a Helm chart with new values.
- Hooks are executable files that addon-operator runs when events occur in the cluster. They have an access to the values storage Hooks are able to monitor cluster changes and update values in the values storage. Hooks also allow discovering the values of the cluster at its’ startup or on a schedule basis. You can also implement the continuous values discovery mechanism getting new values when your cluster changes.
- Module combines Helm chart, values storage, and hooks. You can enable and disable modules. Disabling the module is the removal of all releases of its Helm chart. Modules can enable themselves dynamically. For example, an enabled auxiliary script enables the module if all required modules are enabled or if the discovery mechanism has found the required parameters in hooks.
- Global hooks are “independent”. They are not included in modules and can read and modify values in the global values storage that are accessible by all modules’ hooks.
So how do these parts operate together? Let’s look at the picture from the documentation:
There are two operating scenarios:
- A global hook is triggered by an event, e.g. in response to a resource change in Kubernetes cluster. This hook handles changes and writes new values to the global values storage. Addon-operator notices changes in global storage and runs all modules. With the help of its hooks, each module determines if it should be started and updates its values storage. If the module is enabled, addon-operator initiates installing of corresponding Helm chart. In this case, Helm chart can access the values from both, the module’s storage and global storage.
- The second scenario is simpler. A module hook is triggered by events, it changes values in the module’s values storage. Addon-operator notices this and starts Helm chart with updated values.
The addon can be made as a single hook, a single Helm chart or even several dependent modules. Its actual implementation depends on the complexity of the component being installed into K8s cluster and the desired level of flexibility of settings. For example, there is a sysctl-tuner addon in the /examples repository that has two forms: a) a simple module with a hook and a Helm chart, b) a module that uses values storage (which allows us to add settings by editing ConfigMap).
How the process of updating of components installed by addon-operator is handled?
To run addon-operator in the cluster, you have to build an image that contains:
- addons (hook files and Helm charts);
- addon-operator binary file;
- all third-party components required by hooks: bash, kubectl, jq, python, etc.
Then you can deploy this image into your K8s cluster as an ordinary application… and you will probably want to bring some tagging scheme to this process. If you have a handful of clusters, you can use the same approach as with ordinary applications: new release, new version, go through all clusters and modify the image in your Pods. On the other hand, in the case of a rolling update of a large number of clusters, you can use the concept of self-updating from the channel.
We have implemented the latter in the following way:
- A channel is essentially an identifier that can have any meaningful value. We use
stable, etc for that.
- The channel name is the tag of the image. When you need to roll updates to the channel, a new image is built and tagged with the name of the channel.
- When a new image appears in the registry, addon-operator is restarted using this new image.
As reflected in the Kubernetes documentation, this method is not on the list of best practices. However, such an approach is not recommended for an ordinary application that operates in a single cluster. In our case of using addon-operator, an application consists of many Deployments dispersed across clusters, so the self-updating feature makes life much easier.
Channels also help with testing. You can configure an auxiliary cluster to the
stage channel and test-roll updates there before rolling them to
stable channels. If an error occurs in your cluster using
ea channel, you can switch it to
stable channel while the issue is being resolved. If the cluster isn’t actively supported, it is switched to its “frozen” channel, e.g.
In addition to updating hooks and Helm charts, you may need to update the third-party component. For example, you notice an error in node-exporter and even figure out how to patch it. Then you open a PR and wait for a new release to change the version of the image in all of your clusters. In order to reduce the waiting time, you can build your own node-exporter and switch to it before your PR is accepted.
Frankly speaking, you can do that without addon-operator, however it makes the whole process more straightforward and transparent. When the module for installing node-exporter and Dockerfile for building your own image would be placed in the same repository, it’s easier to see what happens. If you run several clusters, testing your PR and rolling the new version out is also easier.
Such an approach to component updates works perfectly in our case. However, you can implement any other suitable scheme since addon-operator is just a binary file and doesn’t depend on the chosen mechanism of updating.
The principles implemented in addon-operator allow you to build a transparent process for creating, testing, installing and updating addons in Kubernetes cluster — it is similar to the process of developing ordinary applications.
The addons for addon-operator made as modules (Helm chart + hooks) can be easily shared with others. In Flant, we plan to share our collection of modules during the following months.