Kubernetes volume plugins evolution from FlexVolume to CSI

Back in the days, when Kubernetes was still young (v1.0.0 or so), there were volume plugins. They were required to connect persistent data storage volumes to Kubernetes. Their number was relatively low at that time. The GCE PD, Ceph, AWS EBS, and some others were among the first storage providers.

Usually, they were bundled with Kubernetes. That’s why they were called “in-tree plugins”. However, many developers have felt that the range of available plugins was quite limited. So they have created their own solutions, integrated them into the Kubernetes core via patches, compiled their own versions of Kubernetes and installed them on servers. But over time, Kubernetes developers have realized that you cannot solve the problem by giving “a man a fish” — you have to teach him “to fish”. So they decided to include a “fishing rod” in version 1.2.0 for those who need it…

FlexVolume plugin, or The minimum viable “fishing rod”

Kubernetes developers created the FlexVolume plugin, which is a logical wrapping of variables and methods for working with third-party FlexVolume drivers.

Let’s take a closer look at what a FlexVolume driver is. It is an executable file (binary file, Python script, Bash script, etc) that takes command-line arguments as input and returns a message with predefined fields in a JSON format. By convention, the first argument is a method, and all other arguments are its parameters.

Using CIFS Shares in OpenShift (diagram). The FlexVolume driver is right in the center

The FlexVolume driver must implement the following basic set of methods:

flexvolume_driver mount # mounts volume to a directory in the pod
# expected output:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "",
}

flexvolume_driver unmount # unmounts volume from a directory in the pod
# expected output:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "",
}

flexvolume_driver init # initializes the plugin
# expected output:
{
  "status": "Success"/"Failure"/"Not supported",
  "message": "",
  // defines if attach/detach methods are supported
  "capabilities":{"attach": True/False}
}

attach and detach methods determine how kubelet will act when the driver is called. Also, there are two specific methods, expandvolume and expandfs, which allow for dynamic resizing of volumes.

You can use our pull request in the Rook Ceph Operator as an example of changes that expandvolume method provides, as well as an ability to resize volumes on the fly.

Here is an example of FlexVolume driver for NFS:

usage() {
    err "Invalid usage. Usage: "
    err "\t$0 init"
    err "\t$0 mount  "
    err "\t$0 unmount "
    exit 1
}

err() {
    echo -ne $* 1>&2
}

log() {
    echo -ne $* >&1
}

ismounted() {
    MOUNT=`findmnt -n ${MNTPATH} 2>/dev/null | cut -d' ' -f1`
    if [ "${MOUNT}" == "${MNTPATH}" ]; then
        echo "1"
    else
        echo "0"
    fi
}

domount() {
    MNTPATH=$1

    NFS_SERVER=$(echo $2 | jq -r '.server')
    SHARE=$(echo $2 | jq -r '.share')

    if [ $(ismounted) -eq 1 ] ; then
        log '{"status": "Success"}'
        exit 0
    fi

    mkdir -p ${MNTPATH} &> /dev/null

    mount -t nfs ${NFS_SERVER}:/${SHARE} ${MNTPATH} &> /dev/null
    if [ $? -ne 0 ]; then
        err "{ \"status\": \"Failure\", \"message\": \"Failed to mount ${NFS_SERVER}:${SHARE} at ${MNTPATH}\"}"
        exit 1
    fi
    log '{"status": "Success"}'
    exit 0
}

unmount() {
    MNTPATH=$1
    if [ $(ismounted) -eq 0 ] ; then
        log '{"status": "Success"}'
        exit 0
    fi

    umount ${MNTPATH} &> /dev/null
    if [ $? -ne 0 ]; then
        err "{ \"status\": \"Failed\", \"message\": \"Failed to unmount volume at ${MNTPATH}\"}"
        exit 1
    fi

    log '{"status": "Success"}'
    exit 0
}

op=$1

if [ "$op" = "init" ]; then
    log '{"status": "Success", "capabilities": {"attach": false}}'
    exit 0
fi

if [ $# -lt 2 ]; then
    usage
fi

shift

case "$op" in 
    mount)
        domount $*
        ;;
    unmount)
        unmount $*
        ;;
    *)
        log '{"status": "Not supported"}'
        exit 0
esac

exit 1

After creating the executable file, you have to deploy the driver to the Kubernetes cluster. The driver must be present on every cluster node at a predefined path. The default path is

/usr/libexec/kubernetes/kubelet-plugins/volume/exec/vendor_name~driver_name/

… however, the path may vary in various Kubernetes distributions (OpenShift, Rancher, etc).

FlexVolume constraints, or How to throw the bait correctly?

Deploying the FlexVolume driver to the cluster nodes is a non-trivial task. You can do it manually, however, there is a high probability of emergence of new nodes in the cluster due to adding a new node, automatic horizontal scaling, or, which is worse, replacing the node due to its malfunctioning. In this case, it is simply impossible to use persistent storage on these nodes until you manually copy the FlexVolume driver there.

However, one of the Kubernetes’ primitives, DaemonSet, can solve this problem. When a new node is created in the cluster, it automatically gets a new pod defined in the DaemonSet. Then a local volume is mounted to a local directory that matches the path for FlexVolume drivers. Upon successful creation, the pod copies files required by the driver to disk.

Here is an example of such a DaemonSet for deploying the FlexVolume plugin:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: flex-set
spec:
  template:
    metadata:
      name: flex-deploy
      labels:
        app: flex-deploy
    spec:
      containers:
        - image: 
          name: flex-deploy
          securityContext:
              privileged: true
          volumeMounts:
            - mountPath: /flexmnt
              name: flexvolume-mount
      volumes:
        - name: flexvolume-mount
          hostPath:
            path: <host_driver_directory>

… and an example of a Bash script for copying the FlexVolume driver:

#!/bin/sh

set -o errexit
set -o pipefail

VENDOR=k8s.io
DRIVER=nfs

driver_dir=$VENDOR${VENDOR:+"~"}${DRIVER}
if [ ! -d "/flexmnt/$driver_dir" ]; then
  mkdir "/flexmnt/$driver_dir"
fi

cp "/$DRIVER" "/flexmnt/$driver_dir/.$DRIVER"
mv -f "/flexmnt/$driver_dir/.$DRIVER" "/flexmnt/$driver_dir/$DRIVER"

while : ; do
  sleep 3600
done

Please note that copy operation is not atomic. There is a real risk of kubelet starting to use the driver before the process of its preparation is complete, resulting in an error. The correct way would be to copy the driver files under different names and then rename them (since the renaming operation is atomic).

Using Ceph in the Rook Operator (diagram): the FlexVolume driver is inside the Rook Agent

The next problem when using FlexVolume driver is that you have to install some prerequisites for most types of volumes (e.g. ceph-common package for Ceph). Initially, the FlexVolume plugin wasn’t designed for such complex systems.

A creative solution to this problem is implemented in the FlexVolume driver for the Rook operator. The driver itself is an RPC client. The IPC socket for communicating resides in the driver’s directory. As we noted above, DaemonSet is the perfect choice for delivering driver files because it automatically mounts the directory with the Rook driver as a volume. After copying is complete, this pod connects to the IPC socket via mounted volume as a fully-functional RPC server. The ceph-common package is already installed in the pod’s container. IPC socket ensures that the kubelet will communicate with the appropriate pod in the same node. Brilliant idea, isn’t it?

Hasta la vista, in-tree plugins!

At some point, Kubernetes developers have discovered there are 20 in-tree plugins for storage volumes. Each (even minor) change in them has had to go through the full Kubernetes release cycle.

As it turns out, you have to update the entire cluster just to use a new plugin version! Additionally, you may encounter an unpleasant surprise: the new Kubernetes version might be incompatible with the current Linux kernel! So, you wipe your tears and beg your boss and clients for permission to update the Linux kernel and the Kubernetes cluster (with a possible downtime)…

Isn’t it strange and funny? Over time, it has become evident to the whole community that the existing approach does not work. So the Kubernetes developers have forcefully decided to stop including new volume plugins in the core. Plus, as we already know, the number of shortcomings was revealed in the implementation of FlexVolume plugins.

CSI, the last plugin included in the core, was intended to solve the problem of persistent storage once and for all. Its alpha version, briefly described as Out-of-Tree CSI Volume Plugins, was announced in Kubernetes 1.9.

Container Storage Interface, or The CSI 3000 spinning rod!

Firstly, we would like to emphasize that CSI isn’t just a volume plugin, it’s a fully-fledged standard for creating custom components to work with data storage. Container orchestration systems, such as Kubernetes and Mesos, were supposed to “learn” how to work with components implemented according to this standard. Well, Kubernetes has successfully done that.

How does the Kubernetes CSI plugin work? The CSI plugin uses custom drivers (CSI drivers) created by third-party developers. The CSI driver for Kubernetes must contain at least these two components (pods):

  • Controller that manages persistent external volumes. It is implemented as a gRPC server using a StatefulSet primitive.
  • Node that mounts persistent external volumes to cluster nodes. It is also implemented as a gRPC server, however based on a DaemonSet primitive.
How the CSI plugin operates in Kubernetes

You can get more details on how it works in this article: Understanding the CSI.

Advantages of this approach

  • For basic activities — e.g. to register a driver in the node — Kubernetes developers have implemented a set of containers. You no longer need to manually generate the JSON response with capabilities (as was the case with the FlexVolume plugin).
  • Instead of deploying executable files to nodes, we deploy pods to the cluster. That is what we expect from Kubernetes: everything occurs inside containers deployed via Kubernetes primitives.
  • To create complex drivers, you no longer need to develop an RPC server and an RPC client. The client is already implemented by K8s developers.
  • Passing arguments via gRPC protocol is much more convenient, flexible, and reliable than passing them via command-line arguments. If you want to learn how to add volume metrics support into CSI by adding a standardized gRPC method, you can turn to our pull request for a vsphere-csi driver as an example.
  • Communication goes through IPC sockets to ensure the correctness of kubelet requests.

Does this list look familiar to you? Right, the advantages of CSI make up for the shortcomings of the FlexVolume plugin.

Conclusion

As a standard for creating custom plugins for working with data storage, CSI was warmly welcomed by the community. Furthermore, thanks to its advantages and versatility, CSI drivers have been implemented even for Ceph or AWS EBS which have had their own plugins before (integrated into Kubernetes since the very beginning).

Early in 2019, the in-tree plugins were declared deprecated. FlexVolume plugin will be maintained by Kubernetes developers, but new functionality will only be added to CSI, not to FlexVolume.

We have accumulated extensive experience using ceph-csi, vsphere-csi plugins, and ready to enhance this list! So far, CSI is doing an excellent job, so we’ll wait and see how it pans out.

Remember, the more things change, the more they stay the same!

Comments

Your email address will not be published. Required fields are marked *