Blog
24 July 2020
Andrey Sidorov, software engineer

How to modify etcd data of your Kubernetes directly (without K8s API)

Have you ever thought about a “low-level” way of changing the etcd data of your Kubernetes cluster? That is, you alter etcd-stored values without using any common Kubernetes tooling like its native CLI utilities or even API. We’ve been made to perform such a task and here’s our story: why and how we’ve done it.

How it all started

An increasing number of customers (that’s basically developers) ask us to provide access to the Kubernetes cluster in order to interact with internal services. They want to be able to connect directly to a database or a service, to connect their local application to other applications within the cluster, etc.

For example, you might need to connect to the memcached.staging.svc.cluster.local service from your local machine. We accomplish that via a VPN inside the cluster to which the client connects. To do this, we announce subnets related to pods & services and push cluster’s DNS to the client. As a result, when the client tries to connect to the memcached.staging.svc.cluster.local service, the request goes to the cluster’s DNS. It returns the address for this service from the cluster’s service network or the address of the pod.

We configure K8s clusters using kubeadm. In this case, the default service subnet is 192.168.0.0/16, and the pod’s subnet is 10.244.0.0/16. Generally, this approach works just fine. However, there are a couple of subtleties:

  • The 192.168.*.* subnet is often used in our customers’ offices, and even more often in the home offices of developers. And that is a recipe for disaster: home routers use the same address space, so the VPN pushes these subnets from the cluster to the client.
  • We have several clusters (production, stage, multiple dev clusters). In this case, all of them will have the same subnets for pods and services by default, which makes it very difficult to use services in multiple clusters simultaneously.

We have been using different subnets for different services and pods within the same project for quite a while. In this case, any cluster has its own networks. At the same time, we are maintaining a large number of K8s clusters that we would prefer not to redeploy from the scratch since they have many running services, stateful applications, and so on.

At some point, we’ve asked ourselves: how do we change a subnet in the existing cluster?

Searching for a solution

The most common way is to recreate all services of the ClusterIP type. You can find this kind of advices as well:

The following process has a problem: after everything configured, the pods come up with the old IP as a DNS nameserver in /etc/resolv.conf.

Since I still did not find the solution, I had to reset the entire cluster with kubeadm reset and init it again.

Unfortunately, that does not work for everyone… Let’s have a more detailed problem definition for our case:

  • We use Flannel;
  • There are both bare metal and cloud Kubernetes clusters;
  • We would prefer to avoid having to redeploy all services in the cluster;
  • We would like to make the transition as hassle-free as possible;
  • The cluster is managed by Kubernetes 1.16.6 (however, our steps will fit other versions, too);
  • The goal is to replace the 192.168.0.0/16 service subnet with 172.24.0.0/16 in the cluster deployed using kubeadm.

As a matter of fact, we have long been tempted to investigate how Kubernetes stores its data in etcd and what can be done with this storage at all… So we just thought: “Why don’t we update the data in etcd by replacing old subnet IPs with the new ones?

We have been looking for ready-made tools for modifying data in etcd… and nothing has met our needs. But it’s not all bad: etcdhelper by OpenShift was a good starting point (thanks to its creators!). This tool can connect to etcd using certificates, and read etcd data using ls, get, dump commands.

By the way, do not hesitate to share links if you are aware of tools for direct processing data in etcd!

Extending etcdhelper

Looking at etcdhelper we thought: “Why don’t we expand this utility so it will write data to etcd?”

Our efforts have resulted in creating an updated version of etcdhelper with two new functions: changeServiceCIDR and changePodCIDR. Its source code is available here.

What do the new features do? Here is the algorithm of changeServiceCIDR:

  • we create a deserializer;
  • compile a regular expression to replace CIDR;
  • go through a list of ClusterIP services in the cluster and perform a few operations for each of them.

Here are our operations:

  • we decode the etcd value and place it in the Go object;
  • replace the first two bytes of the address using a regular expression;
  • assign the service an IP address from the new subnet’s address range;
  • create a serializer, convert the Go object to protobuf, write new data to etcd.

The changePodCIDR function is essentially the same as changeServiceCIDR. The only difference is that instead of services, we edit the specification of nodes and replace the value of .spec.PodCIDR with the new subnet.

Usage

Replacing serviceCIDR

This task is very straightforward to implement. However, it involves a downtime while all the pods in the cluster are being recreated. First, we will describe the main steps, and later, we will share our thoughts on how to minimize that downtime.

Preparatory steps:

  • install the necessary software and build the patched etcdhelper tool;
  • back up your etcd and /etc/kubernetes.

Here is a summary of actions for changing serviceCIDR:

  • make changes in apiserver and controller-manager manifests;
  • reissue certificates;
  • modify the ClusterIP specification of services in etcd;
  • restart all pods in the cluster.

Below is a detailed description of the steps.

1. Install etcd-client for dumping the data:

apt install etcd-client

2. Build the etcdhelper tool:

  • Install golang:
GOPATH=/root/golang
mkdir -p $GOPATH/local
curl -sSL https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz | tar -xzvC $GOPATH/local
echo "export GOPATH=\"$GOPATH\"" >> ~/.bashrc
echo 'export GOROOT="$GOPATH/local/go"' >> ~/.bashrc
echo 'export PATH="$PATH:$GOPATH/local/go/bin"' >> ~/.bashrc
  • Copy etcdhelper.go, download dependencies, build the tool:
wget https://raw.githubusercontent.com/flant/examples/master/2020/04-etcdhelper/etcdhelper.go
go get go.etcd.io/etcd/clientv3 k8s.io/kubectl/pkg/scheme k8s.io/apimachinery/pkg/runtime
go build -o etcdhelper etcdhelper.go

3. Back up the etcd data:

backup_dir=/root/backup
mkdir ${backup_dir}
cp -rL /etc/kubernetes ${backup_dir}
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints https://192.168.199.100:2379 snapshot save ${backup_dir}/etcd.snapshot

4. Switch the services subnet in the manifests of the Kubernetes control plane. Replace the value of the --service-cluster-ip-range parameter with the new subnet (172.24.0.0/16 instead of 192.168.0.0/16) in /etc/kubernetes/manifests/kube-apiserver.yaml and /etc/kubernetes/manifests/kube-controller-manager.yaml.

5. Since we are making changes to the service subnet for which kubeadm issues the apiserver certificates (among others), you have to reissue them:

5.1. Check which domains and IP addresses current certificate is issued for:

openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
    DNS:dev-1-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:apiserver, IP Address:192.168.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100

5.2. Prepare the basic config for kubeadm:

cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "172.24.0.0/16"
apiServer:
  certSANs:
  - "192.168.199.100" # master node's IP address

5.3. Delete the old crt and key files (you have to remove them in order to issue the new certificate):

rm /etc/kubernetes/pki/apiserver.{key,crt}

5.4. Reissue certificates for the API server:

kubeadm init phase certs apiserver --config=kubeadm-config.yaml

5.5. Check that the certificate is issued for the new subnet:

openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
    DNS:kube-2-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:172.24.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100

5.6. After the API server certificate reissue, you’ll have to restart its container:

docker ps | grep k8s_kube-apiserver | awk '{print $1}' | xargs docker restart

5.7. Renew the certificate embedded in the admin.conf:

kubeadm alpha certs renew admin.conf

5.8. Edit the data in etcd:

./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-service-cidr 172.24.0.0/16

Caution! At this point, the DNS stops resolving domain names in the cluster. It happens because the existing pods still have the old CoreDNS (kube-dns) address in /etc/resolv.conf, while kube-proxy has already changed iptables’ rules using our new subnet instead of the old one. Below, we will discuss possible ways to minimize downtime.

5.9. Edit ConfigMaps in the kube-system namespace:

a) In this CM:

kubectl -n kube-system edit cm kubelet-config-1.16

— replace ClusterDNS with the new IP address of the kube-dns service: kubectl -n kube-system get svc kube-dns.

b) In this CM:

kubectl -n kube-system edit cm kubeadm-config

— switch the data.ClusterConfiguration.networking.serviceSubnet parameter to the new subnet.

5.10. Since the kube-dns address has changed, you need to update the kubelet config on all nodes:

kubeadm upgrade node phase kubelet-config && systemctl restart kubelet

5.11. It is time to restart all pods in the cluster:

kubectl get pods --no-headers=true --all-namespaces |sed -r 's/(\S+)\s+(\S+).*/kubectl --namespace \1 delete pod \2/e'

Minimizing downtime

Here are a few ideas on how to minimize downtime:

  1. After editing the control plane manifests, you can create a new kube-dns service with a new name (e.g., kube-dns-tmp) and a new address (172.24.0.10).
  2. Then you can insert the if condition in etcdhelper. It will prevent modifying the kube-dns service.
  3. Replace the old ClusterDNS address in all kubelets with the new one (meanwhile, the old service will continue running simultaneously with the new one).
  4. Wait until all applications’ pods will be redeployed either naturally or at the agreed time.
  5. Delete the kube-dns-tmp service and edit serviceSubnetCIDR for the kube-dns service.

This plan will shorten downtime approximately to a minute: the period required to delete the kube-dns-tmp service and switch the subnet of the kube-dns service.

Modifying podNetwork

Along the way, we have decided to modify podNetwork using our etcdhelper. Here is the required sequence of actions:

  • edit configurations in the kube-system namespace;
  • edit the manifest of the kube-controller-manager;
  • edit podCIDR directly in etcd;
  • restart all nodes in the cluster;

Below is a detailed description of the above actions:

  1. Edit ConfigMaps in the kube-system namespace:

a) Here:

kubectl -n kube-system edit cm kubeadm-config

— replace data.ClusterConfiguration.networking.podSubnet with the new subnet (10.55.0.0/16).

b) Here:

kubectl -n kube-system edit cm kube-proxy

— specify the new data.config.conf.clusterCIDR: 10.55.0.0/16.

2. Edit the manifest of the controller-manager:

vim /etc/kubernetes/manifests/kube-controller-manager.yaml

— specify: --cluster-cidr=10.55.0.0/16.

3. Verify the current values of .spec.podCIDR, .spec.podCIDRs, .InternalIP, .status.addresses for all cluster nodes:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

4. Replace podCIDR by editing etcd directly:

./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-pod-cidr 10.55.0.0/16

5. Check if podCIDR has changed:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

6. Restart all nodes of the cluster one at a time.

7. If there is at least one node with the old podCIDR, kube-controller-manager will not start, and pods in the cluster will not be scheduled.

As a matter of fact, there are easier ways to change podCIDR (example). But still, we wanted to learn how to work with etcd directly since there are cases when editing Kubernetes objects right in etcd is the only possible solution (for example, there is no way to avoid downtime when changing the spec.clusterIP field of the Service).

Summary

In this article, we have explored the possibility of working with the data in etcd directly (i.e., without using the Kubernetes API). At times, this approach allows you to do some “tricky things”. We have successfully tested all the above steps using our etcdhelper on real K8s clusters. However, the whole scenario is still PoC (proof of concept) only. Please use it at your own risk.

Afterword

This article has been originally posted on Medium. New texts from our engineers are placed here, on blog.flant.com. Please follow our Twitter or subscribe below to get last updates!