Running stateless workloads is so much easier. Usually, they are just a set of static YAML files and docker images by nature. For YAML files, we typically use a Git repository hosted on our side, on Github/Bitbucket, or at some other place. Docker images have their own home e.g., Docker Hub or Harbor. A stateless application does not contain any valuable data and could be easily killed and recreated with no critical data lost.
Nevertheless, any application contains some kind of stateful workload. The question of how to create backups and restore all the dynamic data generated by such stateful applications? How to make a backup of databases and uploaded by users or dynamically generated files? In this article, we will take a stateful application, prepare backup, and demonstrate how it could be restored on demand. It is supposed that you already have the kubectl tool and Helm v3.x installed. Let’s go!
There are two primary approaches to store backups in the Kubernetes environment. The first is to use an object store, like AWS S3, DigitalOcean Object Storage, Google Cloud Storage, etc. In this case, backup tools prepare backups and push them to such storage. Here we will try to follow this approach with a Stash backup solution example.
The second way is to make block storage volume snapshots on the cloud level. For example, snapshots of AWS EBS or DigitalOcean Volumes Block Storage, as you can guess, will work only within AWS or DigitalOcean. We are going to test this method with Velero.
One technical challenge is to organize a data repository to store all backups. But there is another question - how to extract dynamic data from inside the cluster and push it to the repository? A popular method involves using two Kubernetes patterns: sidecars and init containers.
In that case, the sidecar pattern means the injection of the additional auxiliary container to the application pod. That container works inside the same pod in parallel with the main app container and has access to volumes. On the other hand, the init container concept means injecting the additional container, but the init container works only during the pod initialization process sequentially and strictly before the main app container will be initialized. So if we need to run the init container, we need to recreate the pod.
In most cases, backup solutions create Custom Resource Definitions (such as Backup, Restore or RestoreSession, etc.) inside the cluster, and Backup Controller manages entities in Kubernetes natively. For example, Velero queries the API server for resources directly and then sends them to storage with no containers injected.
Velero is a tool (or even a toolset) that helps you to backup and restore your cluster resources and persistent volumes. It could work with both application data and cluster objects. Velero consists of a cloud part, which is to be installed inside your cluster and client-side, command-line tool for backup and restore management.
Commands below consider Velero client installation on a Mac, but there are instructions available for installation on Linux and Windows (https://velero.io/docs/v1.4/basic-install). The server component could be installed afterward using a local Velero client or an official Velero Helm chart (https://vmware-tanzu.github.io/helm-charts/). Let’s review how to do it with client software support. At first, we need to install the Velero client as itself:
$ brew install velero
$ velero version
If Velero returns its version, it means that everything is okay on this step and that we can continue. As the next action, we need to install Velero to the cloud. There are at least two ways to do it, as I mentioned before - to use the command line and to run the Helm chart. Let’s use CLI command as following:
$ velero install --provider aws \
--bucket velero-backup \
--secret-file ./velero-creds \
--use-volume-snapshots=true \
--backup-location-config region=":default-placement",s3ForcePathStyle="true",s3Url=http://rook-ceph-rgw-my-store.rook-ceph.svc.cluster.local \
--snapshot-location-config region=":default-placement" \
--use-restic
When Velero is installed, we can move on to an example case. Let’s create something straightforward so that we can make a backup, destroy the original data, and restore everything. Nginx with a persistent volume is a good test case. Specific steps would involve:
To continue with sandbox, it will be required to download an example manifest from Velero GitHub account https://github.com/vmware-tanzu/velero/blob/master/examples/nginx-app/with-pv.yaml. Save it as with-pv.yaml locally. Then let’s deploy our test workload:
$ kubectl apply -f with-pv.yaml
We will expect the application to generate dynamic data to /var/log/nginx folder and all that contents will be backed up and then restored.
It is time to prepare a new snapshot. Here I need to remind that Velero makes data backup and saves all Kubernetes entities also. Please pay attention to —include-namespaces parameter. It means that even if the whole namespace with your application is lost, Velero can restore the namespace. Let’s try it:
$ velero backup create nginx-backup --include-namespaces nginx-example
On this step, you can go to your cloud provider management console and check that snapshot was created. This process is asynchronous, so it could take some time before everything will be completed. When it is completed, we are ready to kill the test workload to simulate a disaster.
$ kubectl delete namespaces nginx-example
After this step, it is best to open the cloud management console and wait till the disk is destroyed. Once complete, we would have confirmation that there is no working data right now, meaning it is time to restore everything.
$ velero restore create --from-backup nginx-backup
Let’s wait a bit and then check that our workload is restored.
Stash is a cloud-native data backup and recovery solution for Kubernetes workloads. It works as a Kubernetes operator that uses Restic or Kubernetes CSI Driver VolumeSnapshotter to make backups. There is one exciting thing about Stash - it can make backup not just volumes snapshot only, but database dumps also. Currently, PostgreSQL, MySQL, MongoDB, Elasticsearch, and Percona XtraDB databases are supported.
To install Stash to the cluster, we need to have Helm. Even though both Helm v2 and Helm v3 are supported, it is best to not to use Helm v2 anymore as the transition to the newest version would introduce additional complexities down the road. Let’s add a new repository and install it with Helm 3:
$ helm repo add appscode https://charts.appscode.com/stable/
$ helm repo update
$ helm search repo appscode/stash --version v0.9.0-rc.6
$ helm install stash appscode/stash \
--version v0.9.0-rc.6 \
--namespace kube-system
Before we start with backup settings, it is worth to prepare some workload as a sandbox. Let’s use examples from Stash’s official documentation. For our experiment, we will use PVC (please pay attention that you have available PVs in your cluster) and a Busybox deployment.
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/pvc.yaml
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/deployment.yaml
If you look inside development manifesto, you will see the following lines between others:
spec:
containers:
- args: ["echo sample_data > /source/data/data.txt && sleep 3000"]
command: ["/bin/sh", "-c"]
It means that our test application generates some dynamic data at /source/data/, which should be backed up. On this step, every pod has only one container inside. You can check it with the following command (look at the response READY column):
$ kubectl get pod -n demo
Stash uses S3 buckets and other cloud storages to store backups on the backend side. There are more backend options available (e.g., Kubernetes Volume, GCS, MAS, etc.), and we will try a popular Google Cloud Storage option here. To push backups to the GCS, we need to create credentials secret and Restic password first. Fill the secret YAML with GCS credentials and pick any Restic password to be set.
$ kubectl create ns demo
$ echo -n 'changeme' > RESTIC_PASSWORD
$ echo -n 'google-project-id' > GOOGLE_PROJECT_ID
$ cat downloaded-sa-json.key > GOOGLE_SERVICE_ACCOUNT_JSON_KEY
$ kubectl create secret generic -n demo gcs-secret \
--from-file=./RESTIC_PASSWORD \
--from-file=./GOOGLE_PROJECT_ID \
--from-file=./GOOGLE_SERVICE_ACCOUNT_JSON_KEY
Now we need to create a new repository, which will have GCS as a backend and then prepare a backup configuration. For that case, we can use manifests from official documentation again. First, we create a repository. It is a Stash CRD with repository name and backend credentials:
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/repository.yaml
Feel free to open that link from GitHub to learn all parameters in detail. The next thing is to create a backup configuration, one more custom Stash entity:
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/backupconfiguration.yaml
In the backup configuration, we provide more details about what should be saved, the expected schedule, and the retention policy. After repository and backup entities are created, we can see injected sidecar in every pod, and new cloud storage on the Google side created. Recheck READY column:
$ kubectl get pod -n demo
Then we need to wait 5 minutes according to the backup schedule from the example above. You can check the backup status anytime with the following command:
$ kubectl get repository -n demo gcs-repo
The restoration procedure in Stash is a bit different from Velero. To restore our application data, we can recreate our workload (create another instance of the application) and restore the backup to that new instance.
To avoid making backups of the old deployment, let’s stop it (or at least pause).
$ kubectl patch backupconfiguration -n demo deployment-backup --type="merge" --patch='{"spec": {"paused": true}}'
After it is paused, we can go further with the deployment of new PVC and Deployment.
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/recovered_deployment.yaml
When the new instances of our workload are there, it is time to create a restore session. It is another Stash CRD entity supposed to restore the data from backup.
$ kubectl apply -f https://github.com/stashed/docs/raw/v0.9.0-rc.6/docs/examples/guides/latest/workloads/deployment/restoresession.yaml
Voila! That’s mainly it. Be prepared that data restore from remote cloud storage is not an instant process. As a result of restoration, you will see a new workload instance with original data inside.
Current cloud backup solutions are trying to be quite universal, and that’s why backups are not the most straightforward domain in cluster operations. Another cause of complexity is the variety of clouds and protocols. It gives us more flexibility but still requires separate personal settings.
Velero and Stash provide different solutions to stateful backup applications. Velero stores volumes along with with all Kubernetes entities, like pods, PVC, etc. It can create the entire cluster snapshot. If we speak about databases, Velero stores the whole database data folder and gives the possibility to flush in-memory buffers before taking a snapshot.
Stash is more concentrated on volume backups and database dumps but is expanding its functionality also. Velero mostly operates cloud resources and API calls while Stash looks more file-oriented. Organizations that are heavy on automation and have a robust GitOps pipeline set up might prefer Stash over Velero because they would rely on the pipeline to restore Kubernetes entities and expect Stash just to restore volumes.
Both solutions could store the data in cloud storage (e.g., S3 buckets) or use Kubernetes VolumeSnapshot API. Other storage types, like persistent volumes, also available.