Blog: Kubernetes 1.22: A New Design for Volume Populators

Authors:
Ben Swartzlander (NetApp)

Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume
populators. Originally implemented
in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API
field called dataSourceRef that fixes these problems.

Data sources

Earlier Kubernetes releases already added a dataSource field into the
PersistentVolumeClaim API,
used for cloning volumes and creating volumes from snapshots. You could use the dataSource field when
creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace.
That also modified the normal provisioning process so that instead of yielding an empty volume, the
new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot.

Volume populators embrace the same design idea, but extend it to any type of object, as long
as there exists a custom resource
to define the data source, and a populator controller to implement the logic. Initially,
the dataSource field was directly extended to allow arbitrary objects, if the AnyVolumeDataSource
feature gate was enabled on a cluster. That change unfortunately caused backwards compatibility
problems, and so the new dataSourceRef field was born.

In v1.22 if the AnyVolumeDataSource feature gate is enabled, the dataSourceRef field is
added, which behaves similarly to the dataSource field except that it allows arbitrary
objects to be specified. The API server ensures that the two fields always have the same
contents, and neither of them are mutable. The differences is that at creation time
dataSource allows only PVCs or VolumeSnapshots, and ignores all other values, while
dataSourceRef allows most types of objects, and in the few cases it doesn’t allow an
object (core objects other than PVCs) a validation error occurs.

When this API change graduates to stable, we would deprecate using dataSource and recommend
using dataSourceRef field for all use cases.
In the v1.22 release, dataSourceRef is available (as an alpha feature) specifically for cases
where you want to use for custom volume populators.

Using populators

Every volume populator must have one or more CRDs that it supports. Administrators may
install the CRD and the populator controller and then PVCs with a dataSourceRef specifies
a CR of the type that the populator supports will be handled by the populator controller
instead of the CSI driver directly.

Underneath the covers, the CSI driver is still invoked to create an empty volume, which
the populator controller fills with the appropriate data. The PVC doesn’t bind to the PV
until it’s fully populated, so it’s safe to define a whole application manifest including
pod and PVC specs and the pods won’t begin running until everything is ready, just as if
the PVC was a clone of another PVC or VolumeSnapshot.

How it works

PVCs with data sources are still noticed by the external-provisioner sidecar for the
related storage class (assuming a CSI provisioner is used), but because the sidecar
doesn’t understand the data source kind, it doesn’t do anything. The populator controller
is also watching for PVCs with data sources of a kind that it understands and when it
sees one, it creates a temporary PVC of the same size, volume mode, storage class,
and even on the same topology (if topology is used) as the original PVC. The populator
controller creates a worker pod that attaches to the volume and writes the necessary
data to it, then detaches from the volume and the populator controller rebinds the PV
from the temporary PVC to the orignal PVC.

Trying it out

The following things are required to use volume populators:

  • Enable the AnyVolumeDataSource feature gate
  • Install a CRD for the specific data source / populator
  • Install the populator controller itself

Populator controllers may use the lib-volume-populator
library to do most of the Kubernetes API level work. Individual populators only need to
provide logic for actually writing data into the volume based on a particular CR
instance. This library provides a sample populator implementation.

These optional components improve user experience:

  • Install the VolumePopulator CRD
  • Create a VolumePopulator custom respource for each specific data source
  • Install the volume data source validator
    controller (alpha)

The purpose of these components is to generate warning events on PVCs with data sources
for which there is no populator.

Putting it all together

To see how this works, you can install the sample “hello” populator and try it
out.

First install the volume-data-source-validator controller.

kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/rbac-data-source-validator.yaml
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/setup-data-source-validator.yaml

Next install the example populator.

kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/crd.yaml
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/deploy.yaml

Create an instance of the Hello CR, with some text.

apiVersion: hello.k8s.io/v1alpha1
kind: Hello
metadata:
  name: example-hello
spec:
  fileName: example.txt
  fileContents: Hello, world!

Create a PVC that refers to that CR as its data source.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Mi
  dataSourceRef:
    apiGroup: hello.k8s.io
    kind: Hello
    name: example-hello
  volumeMode: Filesystem

Next, run a job that reads the file in the PVC.

apiVersion: batch/v1
kind: Job
metadata:
  name: example-job
spec:
  template:
    spec:
      containers:
        - name: example-container
          image: busybox:latest
          command:
            - cat
            - /mnt/example.txt
          volumeMounts:
            - name: vol
              mountPath: /mnt
      restartPolicy: Never
      volumes:
        - name: vol
          persistentVolumeClaim:
            claimName: example-pvc

Wait for the job to complete (including all of its dependencies).

kubectl wait --for=condition=Complete job/example-job

And last examine the log from the job.

kubectl logs job/example-job
Hello, world!

Note that the volume already contained a text file with the string contents from
the CR. This is only the simplest example. Actual populators can set up the volume
to contain arbitrary contents.

How to write your own volume populator

Developers interested in writing new poplators are encouraged to use the
lib-volume-populator library
and to only supply a small controller wrapper around the library, and a pod image
capable of attaching to volumes and writing the appropriate data to the volume.

Individual populators can be extremely generic such that they work with every type
of PVC, or they can do vendor specific things to rapidly fill a volume with data
if the volume was provisioned by a specific CSI driver from the same vendor, for
example, by communicating directly with the storage for that volume.

The future

As this feature is still in alpha, we expect to update the out of tree controllers
with more tests and documentation. The community plans to eventually re-implement
the populator library as a sidecar, for ease of operations.

We hope to see some official community-supported populators for some widely-shared
use cases. Also, we expect that volume populators will be used by backup vendors
as a way to “restore” backups to volumes, and possibly a standardized API to do
this will evolve.

How can I learn more?

The enhancement proposal,
Volume Populators, includes lots of detail about the history and technical implementation
of this feature.

Volume populators and data sources, within the documentation topic about persistent volumes,
explains how to use this feature in your cluster.

Please get involved by joining the Kubernetes storage SIG to help us enhance this
feature. There are a lot of good ideas already and we’d be thrilled to have more!

Originally posted on Kubernetes – Production-Grade Container Orchestration
Author:

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *