Announcing Deckhouse, the Kubernetes Platform from Flant is now generally available
Today, we are delighted to announce the first Open Source release of Deckhouse. Developing and operating the platform in numerous, very diverse use cases took three and a half years. Deckhouse currently helps us maintain over 170 production clusters (3,500+ nodes) with more than 3,000 applications running. Deckhouse is the quintessence of our experience in operating Kubernetes clusters and the culmination of all the production-related activities we’ve had over the past several years.
The HighLoad++ conference this May served as the first public venue to introduce Deckhouse to a broader audience. Following the formal presentation, the first interested persons received tokens for early access to the platform. More than 300 people have already tried Deckhouse on their own and the total number of active Deckhouse installations (either as a standalone solution or as part of a managed Kubernetes service) has now risen above 2,000. Now, it is time to share our experience in automating Kubernetes with the broader community!
Deckhouse: How and Why
Deckhouse came naturally as part of Flant’s everyday activities. We saw great potential in Kubernetes for infrastructure management and we thus adopted it as a de-facto standard. We began to deploy more and more applications (client ones as well as our own – dogfooding is definitely our thing) all the while gaining the experience we needed.
We rapidly amassed experience operating Kubernetes clusters and established the best practices we could discover. However, all this required standardization. Engineers from different teams within the company were responsible for cluster maintenance, and the approaches and solutions they devised were relevant to everyone.
This nuance was yet another reason to accumulate the best practices in a single solution. That solution is now known as Deckhouse. At first, the bright minds of SRE/DevOps engineers from different teams were brought together to develop it. Over time, however, responsibility for the product was handed over to a dedicated team. Such a transfer of all Kubernetes-related platform tasks to a new team freed the engineers on the other teams from having to handle these problems. What they now had at their disposal was a completely ready-to-run Kubernetes that “just works”. The teams were able to focus on other tasks that were more relevant to the needs of developers and the necessities of our clients’ businesses.
What are these problems specifically? Our teams required a Kubernetes that would:
- Run on any infrastructure. This one is of particular importance given the diversity of infrastructure requirements our customers have. Some of them use bare-metal servers and virtual machines, others take advantage of public cloud providers’ offerings, while still others use a private cloud or some combination of all of the above. In fact, the ability to run on any infrastructure was stated as one of the main advantages Kubernetes offered! But most of us failed to notice that cloud providers actually “had taken away” this feature from regular K8s users in their managed solutions.
- Provide all the necessary tools for running production applications. In other words, the Kubernetes installation must provide the full-fledged toolset required for running and maintaining applications rather than being a “do-it-yourself” constructor. We will discuss these tools later.
- Be fully (and mostly self-) maintainable. All the cluster maintenance tasks — such as upgrading components, monitoring K8s itself, etc — must be solved either automatically or by the platform team.
Our in-house product, which later became known as Deckhouse, was designed to meet these demands.
What is Deckhouse?
A platform, not a cluster
So what sets Deckhouse apart from other Kubernetes implementations? Let’s first recall a great analogy that Dmitry Stolyarov, Flant’s CTO, provided:
- What is vanilla Kubernetes? It is an empty apartment.
- Can you live in it? Yes, you can. But it will be neither comfortable nor pleasant. In most cases, you will also need furniture, decoration, and some household supplies.
- Deckhouse, on the other hand, is a fully furnished apartment (with well-designed ergonomics) with everything you need to enjoy a comfortable, carefree life.
The term “platform” seems much more appropriate in this case. That is why the main product of the new project is called Deckhouse Platform.
In fact, OpenShift by Red Hat belongs to the same category. Sometimes, users confuse such platforms with managed offerings by GKE or AKS, but that’s incorrect. The latter do not offer a comprehensive platform: they offer you a good foundation and bare walls, so to speak; however, there are some essential aspects, such as monitoring, security, etc. which remain overlooked.
A Feature-Complete Solution
If you discard that fundamental point and dive into the difference between Deckhouse and popular managed solutions, you will quickly see the following:
- Cloud providers adhere to the so-called shared responsibility model: AWS, GKE, AKS. The gist of them can be described by the expression “take it and do it” while Deckhouse and similar platforms are designed as “take it and use it” models.
- The Deckhouse Platform does not rely on any specific infrastructure. It provides fully identical clusters + unified approaches for configuring and managing these clusters (on any infrastructure).
- With cloud providers, you have to re-provision virtual machines to upgrade Kubernetes nodes. Deckhouse does this in real-time in most cases (Linux kernel + system component upgrades being the only exception).
- Cloud providers may have additional features; however, they are few in number and usually are tied to the specific infrastructure rather than adhering to the standards of the cloud-native industry.
Let us illustrate the last point with several examples.
- Updating an EKS cluster. Here is the official guide. Each third-party component (comparable to a piece of furniture in the apartment), including CNI, CoreDNS, etc., must be processed individually while the consequences of actions must be transparent and predictable. In contrast, with Deckhouse, all you have to do is edit a single line containing the Kubernetes version to update the cluster.
- Integrating with external authentication and authorization providers (LDAP, GitHub, etc.). Surely, there is a relevant recipe. But would you rather a) follow this convoluted and cumbersome scheme or b) add a couple of lines to the Deckhouse config?
- Scaling applications using custom metrics. Once again, a convoluted and cumbersome scheme is used which involves installing additional components. Once again, you are responsible for the consequences. With Deckhouse – you guessed it – the process is much more straightforward.
In short, we guarantee that all Deckhouse components will work as intended while updates will be performed in a timely and correct manner. This is just as important for us as it is for Deckhouse users since we use this platform for our own needs as well.
The Technological Foundation
Deckhouse Platform is based on the upstream version of Kubernetes and Open Source components which are considered standard in the cloud-native ecosystem. The platform integrates them into a unified tool. It adds all the components required for the production environments, thus minimizing manual manipulations.
What are those standard components exactly? The list includes some well-known names in the cloud-native community, such as CoreDNS, cert-manager, nginx, Prometheus + Grafana, dex, Istio, etc. (See the complete list in the documentation; below, we will discuss some of the main features based on these components).
All these components have been integrated into the Deckhouse Platform as modules. That means that configurations of all these modules are well thought out, so you can make full use of them hassle-free. Also, these modules are regularly updated (the user does not have to worry about updating them on his own). The module settings are managed through the Custom Resource Definitions. These resources were designed with minimalist intentions. They provide a minimal set of parameters, thus reducing the possibility of errors.
- the shell-operator enables you to create scripts* that are run in response to specific events in the Kubernetes cluster.
- the addon-operator automatically manages Kubernetes modules that are created using the shell-operator.
* For prototyping, we used Bash scripts for most of the modules, as it was the easiest (fastest) method available. We then rewrote the prototypes into Go language after ensuring they were correct since Go code is easier to read, maintain, and test.
NB: Note that the shell-operator and addon-operator are not tied to Deckhouse and can be used independently. For instance, Adobe, KubeSphere, and Confluent applications use them as well.
Features of the Deckhouse Platform
The product’s key features are based on feature requests that naturally emerged over the drafting/designing process. Below is the list of the essential features that were implemented in Deckhouse:
1. Infrastructure Agnostic
As we previously noted, you can create K8s clusters on any infrastructure: bare metal, virtual machines, or clouds. The key point is that clusters will be completely identical regardless of the infrastructure that was used. This means that they have the same:
- Kubernetes version (identical to the last bit);
- approaches to managing the cluster and the respective API;
- principles of operation within key subsystems: traffic balancing, cluster autoscaling, monitoring, authentication, and authorization;
- as well as all the other modules available/used.
As for support for public cloud providers, it is worth noting that Deckhouse does not currently use any managed solutions (EKS, GKE, AKS…). Instead, the Deckhouse distribution runs on regular virtual resources. This ensures that all the clusters are identical and do not depend on providers’ specificities and limitations (imagine, for instance, not having to worry about your limit on the number of pods per node or having a lack of support for OIDC). In addition, this enables us to provide full technical support and valid guarantees.
NB. Nevertheless, you can still use managed solutions if you prefer to – after all, Deckhouse can be installed into any existing cluster just as easily as it installs into a new one. In the future, we plan to implement an option to install Deckhouse using a control plane managed by a cloud provider.
Below is the list of currently supported cloud providers:
- Amazon AWS;
- Microsoft Azure;
- Google Cloud Platform;
- OpenStack (both the native installations and cloud providers that use it, e.g. OVH Cloud);
- VMware vSphere (the same as OpenStack).
We will add new cloud providers to the list of those supported if there is a demand from the community (feel free to open your issue for that!).
2. Providing everything you need to maintain your production cluster
The Deckhouse Platform has the following built-in features (among other things):
- monitoring based on cloud-native tools, covering the full cycle, i.e.:
- collecting metrics — an extensive collection of exporters that support authorized scrapers as well as a method for collecting metrics from user applications;
- storing metrics — Prometheus-based long-term and detailed storage (along with query caching and disk size adjustment relevant to actual needs);
- analyzing metrics — an extensive collection of alerts that promptly alert the user in the event of any anomalies;
- visualizing metrics — a large inventory of Grafana dashboards that speed up troubleshooting considerably thanks to allowing the user to do a drill-down to discover the root cause of the problem;
- Ingress controller (based on the NGINX Ingress Controller) for routing user traffic. It fully supports external load balancers (whether cloud-based or stand-alone) or can work directly with users. Another advantage is that controllers are maintained in a seamless fashion;
- horizontal (HPA) and vertical (VPA) application autoscaling, which can be based on any custom metric available in Prometheus (in addition to CPU and RAM triggers – in this sense, Deckhouse compares favorably to vanilla Kubernetes);
- user authentication via external providers (OIDC, LDAP, Google, GitHub, etc.) and flexible access restrictions for users with pre-configured roles for most use cases. The corresponding configuration is based on dex and is very straightforward;
- application log collection using Vector along with forwarding of them to the selected storage;
- Istio service mesh with comprehensive observability and traffic management capabilities. With Istio, you can use federation or multi-cluster deployment models and perform a seamless upgrade for control-plane;
- and more: a Kubernetes web dashboard, SSL certificates management using cert-manager, connection to the cluster over OpenVPN, access control for applications (in a variety of ways), a ready-made priority classes system, descheduler, etc.
Note that all Docker images of modules based on third-party software are rebuilt idempotently using proven Alpine, Ubuntu, etc. images, meaning that:
- in the process of building system component images, versions of third-party software products are fixated and upgraded as the platform develops;
- the process of upgrading basic images and components based on them is performed in a centralized manner. Such an approach offers a quick fix for certain critical CVEs (Common Vulnerabilities and Exposures).
See the complete list of modules and their settings in the documentation.
3. Renders K8s usage more straightforward thanks to the NoOps approach
In Flant, two engineers can easily handle maintaining over 170 clusters. This is because most of the platform components are managed automatically, including:
- system software — the kernel and CRI (container runtime interface);
- Kubernetes core components — kubelet, control plane, etcd, and their certificates;
- Deckhouse Platform components.
We have already shared our experience upgrading a bunch of Kubernetes clusters to new versions. Of course, you choose the stability level that suits you best.
4. Deploys clusters in 8 minutes
You can deploy a ready-to-run Kubernetes cluster in just 8 minutes with a couple of CLI commands. There are pre-configured configurations available for each cloud provider.
5. Offers a 99.95% SLA guarantee
Thanks to the NoOps approach and thorough testing, we were able to produce a rather high level of reliability for the platform. We can guarantee the SLA 99.95% even if we have no direct access to the customer’s infrastructure (and even more provided that we do have direct access to the infrastructure). For SLA monitoring, Deckhouse provides a dedicated component responsible for monitoring critical subsystems. It features a status page that displays the current state of the service and a detailed dashboard.
Deckhouse Platform Editions
Deckhouse is an Open Source project. You can participate in its development on GitHub (and we encourage you to do so!). The platform features two editions: the free Community Edition (CE) and the commercial Enterprise Edition (EE). The CE version is distributed via GitHub under the free Apache 2.0 license.
- Deckhouse CE comes with all the platform’s core features, including deployment to the public cloud and operation on bare-metal. This edition is suitable for those who wish to try the Deckhouse platform out for size on their own (i.e., without vendor support).
- Deckhouse EE includes additional features, e.g. deploying clusters to OpenStack and VMware vSphere, Istio service mesh, several advanced features for bare-metal clusters, and various subscription options with Flant support. The source code of Enterprise Edition is also open, but it’s neither Open Source nor free to use. The Enterprise Edition of Deckhouse can be considered an alternative to Red Hat OpenShift, Platform9, etc.
The Managed Deckhouse service is available to those who would like to benefit from the Flant experience in managing and maintaining Kubernetes clusters. It is based on the Deckhouse Platform Enterprise Edition and implies that our team is fully responsible for the operation of the K8s infrastructure.
Give It a Try
Start with the getting started guide, a step-by-step guide for installing Deckhouse Platform on various infrastructural choices: on an existing cluster, a bare-metal server, AWS, GCP, Azure, and OpenStack.
We also provide free trial tokens for accessing Enterprise Edition for 30 days.
- You can learn more about the features and aspects of different platform editions on the Deckhouse website.
- Follow us on Twitter to stay informed.
- Do not hesitate to contact us on our Telegram channel should you have any questions or require any further information.
- Please, star Deckhouse on GitHub if you like it!