Blog
12 February 2021
Ivan Mikheykin, software engineer

shell-operator & addon-operator news: hooks as admission webhooks, Helm 3, OpenAPI, Go hooks, and more!

Shell-operator and addon-operator are Open Source projects developed by Flant for Kubernetes administrators. They were first introduced in April 2019.

  • Shell-operator simplifies the creation of K8s operators: it allows you to use custom scripts written in Bash, Python, etc. (or any executables) that will be executed when some event happens in the Kubernetes API.
  • Addon-operator is its “big bro.” It simplifies the installation of Helm charts into the cluster by using shell-operator’s hooks to configure them.

In the previous article we described the capabilities of shell-operator as of v1.0.0-beta.11 release (announced last summer). Plus, there was the KubeCon EU’2020 presentation, however it was mostly aimed at those new to shell-operator. (By the way, we still recommend it if you want to understand how shell-operator makes life easier and simplifies the creation of K8s operators – some easy-to-grasp examples of its use are provided there).

Since then, shell-operator and addon-operator have got many new features, and this article covers them.

Noticeable changes in shell-operator v1.0.0-rc1

Now you can use shell-operator hooks as handlers for ValidatingWebhookConfiguration (it describes the configuration of an admission webhook). In other words, a hook can inspect the Kubernetes resource while it’s being created/edited and discard the operation if the resource does not meet some rules. For example, such a rule might read: “You can create a resource only if its image comes from repo.example.com.” You can find an example of implementing such a policy in the 204-validating-webhook directory. Shell-operator supports this kind of hook for Kubernetes version 1.16 or later.

An example of the config function for such a hook is provided below (clipped from the shell hook mentioned above):

function __config__(){
    cat <<EOF
configVersion: v1
kubernetesValidating:
- name: private-repo-policy.example.com
  namespace:
    labelSelector:
      matchLabels:
        # helm adds a 'name' label to a namespace it creates
        name: example-204
  rules:
  - apiGroups:   ["stable.example.com"]
    apiVersions: ["v1"]
    operations:  ["CREATE", "UPDATE"]
    resources:   ["crontabs"]
    scope:       "Namespaced"
EOF
}

Another innovation is that you can now delete a group of metrics by returning the action key:

{"group":"group_name_1", "action":"expire"}

It is useful when objects that are tracked by metrics get deleted. A detailed example is available in the documentation.

Other significant innovations in shell-operator fall under one of these categories:

1. Improvements in resource consumption and performance

  • The period of resynchronization of informers is now random. Previously, all informers attempted to access the API server simultaneously, which created an extra load.

  • Invocations of the erroneous hook now occur with an exponentially increasing delay between repeated runs.
  • Read-only locks are now implemented for reading operations in queues instead of the single lock for both writing and reading.
  • Metrics for CPU time and memory consumption have been added for each hook (see shell_operator_hook_run_sys_cpu_seconds in METRICS).

2. Changes in the build process

  • The flant/shell-operator image now boasts AMD64, ARM, and ARM64 support (greetings to Raspberry Pi fans!).
  • The shell-operator binary is now built statically and should work on any Linux distribution.
  • All flant/shell-operator images with Bash, kubectl, and jq are now based on Alpine only. In the case of a different distribution, you can use the binary file from the main image, while Dockerfile is available in the examples.
  • The .git directory (that got into the image by mistake) has been deleted.
  • Main components are updated: Alpine 3.12, kubectl 1.19.4, Go 1.15.
  • The jq binary is built using the same commit as libjq* to get rid of the performance issues of jq-1.6 (#206).

* By the way, libjq-go is our small Open Source project with CGO bindings for jq. It was developed for shell-operator, but recently we have discovered another use case — the Xbus project. Xbus is a platform for integrating enterprise systems built on top of NATS and developed by the French company CloudCrane SAS. It is great to see the power of Open Source when even such modest projects come in handy to others.

3. Less significant changes

  • Warnings about hook files without the execution flag (+x) are written into the log at startup.
  • You can now build a project without using CGO. This improves the convenience of the usage of shell-operator in other projects (if a fast jqFilter handler is not required).
  • shell_lib.sh now allows you to include the shell framework into the hook with just a single line of code. You may find the example of using this library in the KubeCon presentation mentioned above.

Noticeable changes in addon-operator v1.0.0-rc1

The last release of addon-operator took place at the beginning of 2020. We have made a lot of changes since then.

First of all, it has got the support for OpenAPI schemas for values. You can now define contracts for values that are required by Helm as well as for config values that are stored in ConfigMap and help the user to configure modules.

For example, the schema below defines two required fields for global values (project and clusterName) and two optional fields (the clusterHostname string and the discovery object without keys restrictions):

# /global/openapi/config-values.yaml
type: object
additionalProperties: false
required:
  - project
  - clusterName
minProperties: 2
properties:
  project:
    type: string
  clusterName:
    type: string
  clusterHostname:
    type: string
  discovery:
    type: object

(You can learn more in the documentation.)

Another major achievement is the experimental support for Go hooks. For them to work, you will have to compile your addon-operator by adding imports containing paths to the hooks. An example of their use is available in the 700-go-hooks directory.

Here is the illustration of the global hook written in Go from the example above:

package global_hooks

import "github.com/flant/addon-operator/sdk"

var _ = sdk.Register(&GoHook{})

type GoHook struct {
    sdk.CommonGoHook
}

func (h *GoHook) Metadata() sdk.HookMetadata {
    return h.CommonMetadataFromRuntime()
}

func (h *GoHook) Config() *sdk.HookConfig {
    return h.CommonGoHook.Config(&sdk.HookConfig{
        YamlConfig: `
configVersion: v1
onStartup: 10
`,
        MainHandler: h.Main,
    })
}

func (h *GoHook) Main(input *sdk.BindingInput) (*sdk.BindingOutput, error) {
    input.LogEntry.Infof("Start Global Go hook")
    return nil, nil
}

The implementation of the corresponding SDK is currently alpha and is not particularly well-documented. However, if you find this feature fascinating, feel free to ask questions in the comments or in GitHub Discussions.

Other notable changes to the addon-operator include:

  • Support for installing modules using Helm 3.
  • The introduction of “convergence” and “convergence at start” concepts (this is how the restart cycle for all modules is called). An endpoint for the readiness probe was added: the addon-operator’s pod becomes Ready when all the modules are successfully started (i.e., “convergence at start” is reached).
  • The possibility to enable modules in the global hooks. It is now easier to adjust the list of modules (before, you could only disable the module using its own enabled-script).
  • Informers as well as Synchronization for K8s hooks are now run in individual queues. Also, you can disable waiting for these hooks to complete on startup.
  • The addon-operator’s image is now similar to that of shell-operator: Alpine is used as the base system, the image supports various architectures, the binary is built statically.
  • Additional metrics are available for monitoring the state (see METRICS for more information).

Also, addon-operator has got many improvements that were initially implemented in shell-operator; we have updated the documentation and made several minor fixes.

New users of shell-operator

Over the past time, shell-operator has gained not only new features but new users as well. Among them, we would like to highlight the following projects:

  • Kafka DevOps by Confluent. This project implements the simulated production environment running a streaming application targeting Apache Kafka on Confluent Cloud. The environment is based on Kubernetes, and its resources and applications are managed by the declarative infrastructure. Among other things, the operators (Confluent Cloud Operator and Kafka Connect Operator) are used for this. And they both are based on shell-operator. You can learn more about this project in the authors’ blog. Plus, recently, they have recorded a podcast in which they speak about Kafka DevOps and share their thoughts on shell-operator.
  • Edukates, an interactive learning environment for Kubernetes, has prepared a workshop exploring the use of shell-operator (however, we could not find its text version on the project’s website).
  • Docker Captain from Germany developed a special controller for updating DNS records when restarting the Traefik pod. Soon, he learned about shell-operator and switched its project on using it.
  • Red Hat’s Solution Architect is working on the so-called r53-operator, a “custom domain operator” that manages Ingress domains in AWS Route 53.

Please, share your experience with the application of shell-operator in our recently launched GitHub Discussions. The diversity of use-cases and scenarios can help the global community of engineers and make the use of shell-operator easier. Addon-operator is not as popular as shell-operator, and we would be even more grateful if you share your examples of its use.

Conclusion

Formal GitHub links:

And some final thoughts on it. We have long been using shell-operator and addon-operator in our everyday activities. Their main issues are well studied and eliminated, and currently, we mostly add new features to these two projects. In the near future, we are going to implement the support for the conversion webhook in shell-operator. Also, it will be able to manage Kubernetes objects without the need to use kubectl as well as receive a set of actions from a hook (see #94, #239).

In fact, both projects have long been out of beta status. That is why we have decided to synchronize them with reality and announce rc1-versions. The next shell-operator release for this year may be the long-awaited v1.0.0.

P.S. Last November, the shell-operator passed the 1000-star mark on GitHub, while the addon-operator accumulated more modest 250 stars. Many thanks to those interested in the projects!