Kubernauts

All around Cloud-Native and Open Source Technology

Follow publication

Backup and Restore of Kubernetes Applications using Heptio’s Velero with Restic and Rook-Ceph as the storage provider

--

Quick question!! How many times you’ve read a sentence similar to this “Take backup before proceeding to avoid losing data” ? Almost all the time before doing a critical operation, right?

Yep !! Backups are important, more so in a Kubernetes environment where service interrupting events are inevitable and can occur at any time akin to Murphy’s law. So no matter how highly available or resilient your Kubernetes cluster is, it is of utmost importance to have a solid backup and restore plan or a Disaster recovery plan, call it what you will.

In this blog post we will be using Rook as our underlying storage provider and Heptio’s Velero [previously known as Ark] for backup and restore of our application pvc [wordpress]

Little bit about Ark / Velero

Heptio’s Ark , now known as Velero, has become the de-facto number one backup tool for Kubernetes clusters. It also takes snapshots of your cluster’s Persistent Volumes using your cloud provider’s block storage snapshot features, and can then restore your cluster’s objects and Persistent Volumes to a previous state.

You can use Velero to perform full backups, only backups of some namespaces or resource types or you can schedule backups to execute them periodically.

Why Backup Kubernetes ?

Myriads of reasons why you need a backup and recovering mechanism for your Kubernetes Cluster. For now lets categorize broadly into different groups.

  1. To Recover from Disasters such as
  • Someone accidentally deleted a namespace
  • Kubernetes API upgrade failed and you need to revert back
  • Network went down.
  • Cluster goes into an unrecoverable state
  • Latest application push introduced a critical bug that wiped a persistent volume and you lost the data.
  • Rare case of a natural disaster making your cluster inaccessible.

2. Replicate the environment for debugging, development , staging or before a major upgrade.

3. Migration of Kubernetes cluster from one environment to another.

What to Backup ?

We’ve looked at the Why, now comes the next question of What.

Two things you need to backup :

  1. Kubernetes stores its state in etcd , so to restore the master, etcd and relevant certificates must be backed up. This post wont talk about about backing up etcd as its quite well documented here
  2. Application data i.e persistent volumes, because lets face it you will be having stateful applications running on your cluster. We cover this point in this post.

Pre-requisites

  • A Kubernetes cluster > 1.10 (I’ll be using multi-node Vagrant cluster)
  • Storage Provider for our application workloads (I’ll be using Rook-ceph)

Let’s setup our Infrastructure

Kubernetes Cluster

In this blog post we are proceeding with a local vagrant throwaway cluster. You can also go ahead and test this out on baremetal as well. The difference of course will be in setting up of Rook+ceph for the storage provider. Below steps are for multi node vagrant cluster.

Please ensure that vagrant and your choice of hypervisor(VirtualBox, libvirt, etc) is installed before proceeding.

Clone the git repo and execute the following commands to setup our 1 master 2 worker node Kubernetes cluster:

$ mkdir $HOME/velero-rook-tutorial
$ cd velero-rook-tutorial
$ git clone https://github.com/ipochi/k8s-bkp-restore.git
$ cd k8s-bkp-restore
$ vagrant up
$ vagrant ssh master
$ sudo cp /etc/kubernetes/admin.conf .
$ sudo chown vagrant:vagrant admin.conf
$ exit
$ scp vagrant@192.168.205.10:/home/vagrant/admin.conf .
# use vagrant as password
$ export KUBECONFIG=`pwd`/admin.conf
$ k get nodes

Rook + Ceph

Clone rook git repo and execute the below commands to setup up Rook Operator backed with Ceph as the storage provider. Additionally we also create a storage class to be able to dynamically provision volumes.

$ git clone https://github.com/rook/rook.git
$ cd rook/cluster/examples/kubernetes/ceph
$ kubectl create -f operator.yaml
$ kubectl create -f cluster.yaml
$ kubectl create -f storageclass.yaml

Hopefully you didn’t encounter any issues and our infrastructure is up and running.

$ k get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-mgr-a-764f85d6b9-j7nmh 1/1 Running 3 45h
rook-ceph-mon-a-5dbfffc67-qlmrw 1/1 Running 3 45h
rook-ceph-mon-c-98f9c667d-cg7dz 1/1 Running 3 45h
rook-ceph-mon-d-747fdb9d9f-qq68r 1/1 Running 1 27h
rook-ceph-osd-0-6f74c6d4b8-gpv42 1/1 Running 3 45h
rook-ceph-osd-1-b559ddd7c-fwhrx 1/1 Running 3 45h
rook-ceph-osd-prepare-w1-8vv84 0/2 Completed 0 27h
rook-ceph-osd-prepare-w2-gn5l7 0/2 Completed 0 27h
$ k get pods -n rook-ceph-system
NAME READY STATUS RESTARTS AGE rook-ceph-agent-8rmqq 1/1 Running 7 45h
rook-ceph-agent-d8289 1/1 Running 6 45h
rook-ceph-operator-b996864dd-tqqh6 1/1 Running 3 45h
rook-discover-m4b45 1/1 Running 3 45h
rook-discover-sh24w 1/1 Running 3 45h

Velero Set up

Velero consists of a client installed on your local computer and a server that runs in your Kubernetes cluster, like Helm.

Installing Velero Client

Navigate to the Ark GitHub repo releases page, find the latest release corresponding to your OS and system architecture and copy the link address

$ cd $HOME/velero-rook-tutorial
$ wget https://<link-copied-from-releases-page>

Extract the tarball (change the version depending on yours) and move the velero binary to /usr/local/bin

$ tar -xvzf velero-v0.11.0-darwin-amd64.tar.gz 
$ sudo mv velero /usr/local/bin/
$ velero help

Installing Velero Server

Installing pre-requisites

$ kubectl apply -f config/common/00-prereqs.yaml

What does the pre-requisites install on our cluster ?

  • A velero Namespace
  • The velero Service Account
  • Role-based access control (RBAC) rules to grant permissions to the velero Service Account
  • Custom Resources (CRDs) for the velero-specific resources.

Velero needs a S3 compatible object storage to store the backups. You can check its support-matrix from here.

In this blog post we will be using locally setup minio for storing our backups.

Below steps are for deploying Velero using minio as the object store for backups. Should you wish to use any other cloud provider object storage. Follow the guidelines mentioned in the support-matrix.

The below commands will setup a local minio server as well as ark pods.

Please note the ACCESS KEY for minio is minio and SECREY KEY is minio123

$ kubectl apply -f config/minio/00-minio-deployment.yaml
$ kubectl apply -f config/minio/05-backupstoragelocation.yaml
$ kubectl apply -f config/minio/20-deployment.yaml

How Does Velero Work ?

When you run velero backup create test-backup:

  1. The Velero client makes a call to the Kubernetes API server to create a Backup object.
  2. The BackupController notices the new Backupobject and performs validation.
  3. The BackupController begins the backup process. It collects the data to back up by querying the API server for resources.
  4. The BackupController makes a call to the object storage service – for example, AWS S3 – to upload the backup file.
image Credits: heptio/velero

Restic Plugin

Starting with 0.9 version thanks to Restic support, Velero now supports taking backup of almost any type of Kubernetes volume regardless of the underlying storage provider.

This blog tries to showcase that functionality by not tying ourselves to any cloud provider for Kubernetes cluster or persistent storage purpose. Instead we have used open source Rook-Ceph as our persistent storage provider.

Setting up Velero with restic:

$ kubectl apply -f config/minio/30-restic-daemonset.yaml

How does Restic works with Velero?

Three more Custom Resource Definitions and their associated controllers are introduced for Restic support.

Restic Repository

  • Manages the lifecycle of Velero’s restic repositories.
  • Creates a restic repository per namespace.
  • The controller for this custom resource executes restic repository lifecycle commands — restic init, restic check, and restic prune.

PodVolumeBackup

  • Represents a restic backup of a volume in a pod.
  • The main Velero backup process creates one or more of these when it finds an annotated pod.
  • The associated controller executes restic backup commands to backup pod volume data.

PodVolumeRestore

  • Represents a restic restore of a pod volume.
  • The main Velero restore process creates one or more of these when it encounters a pod that has associated restic backups.
  • The associated controller executes restic restore commands to restore pod volume data.

Now that we have our Infra and Velero setup, lets deploy an application, try out persistent storage using rook, execute a disaster scenario and of course restore our application.

Deploying Wordpress

Apply the yaml files with kubectl and we are good to go. We will be deploying in the wordpress namespace.

$ kubectl create ns wordpress
$ kubectl apply -f app/mysql.yaml -n wordpress
$ kubectl apply -f app/wordpress.yaml -n wordpress

One thing to note over here is the changes we’ve done to the Storage class name to reflect the name of the storage class we created when deploying Rook. This tells Rook to provision the requested storage.

Below is the snippet of code from wordpress.yaml of what I am talking about. Similar would be seen in mysql.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wp-pv-claim
labels:
app: wordpress
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi

If everything done successfully, you’ll have wordpress up and running in a few minutes. Let’s just confirm that by checking if the PVC’s are bound , pods running and service assigned a NodePort 32555.

$ kubectl get pvc -n wordpress
NAME STATUS VOLUME CAPACITY ACCESS STORAGECLASS AGE
mysql-pv-claim Bound <pvc> 2Gi RWO rook-ceph-block 17h
wp-pv-claim Bound <pvc> 2Gi RWO rook-ceph-block 17h
$ kubectl get svc -n wordpress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
wordpress NodePort 10.104.89.235 <none> 80:32555/TCP 18h
wordpress-mysql ClusterIP None <none> 3306/TCP 18h
$ k get pods -n wordpress
NAME READY STATUS RESTARTS AGE
wordpress-7b6c4c79bb-4vnc6 1/1 Running 0 5h29m
wordpress-mysql-6887bf844f-vndtp 1/1 Running 0 5h29m

Let’s add some data, open the UI at http://192.168.205.10:32555, proceed with the installation by providing the details and create some comments and posts, so that we can verify our data still exists after simulating a disaster scenario.

Backup

There are a couple of ways to take backup using velero.

  1. Take backup of the entire namespace
$ velero backup create <bkp-name> --include-namespaces <namespace-name>

2. Annotate the pods you’d like to take the backup of and only those volume names specified will be taken for backup. The form of the command is below

$ kubectl -n YOUR_POD_NAMESPACE annotate pod/YOUR_POD_NAME backup.velero.io/backup-volumes=YOUR_VOLUME_NAME_1,YOUR_VOLUME_NAME_2,...

Let’s annotate our pods for a clearer understanding:

$ kubectl -n wordpress annotate pod/<WORDPRESS_POD_NAME> backup.velero.io/backup-volumes=wordpress-persistent-storage$ kubectl -n wordpress annotate pod/<MYSQL_POD_NAME> backup.velero.io/backup-volumes=mysql-persistent-storage

Now that our pods are annotated, let’s create a backup

$ velero backup create wp-backup
Backup request "wp-backup" submitted successfully.
Run `velero backup describe wp-backup` or `velero backup logs wp-backup` for more details.
$ velero backup describe wp-backup --details
Name: wp-backup
Namespace: heptio-ark
Labels: velero.io/storage-location=default
Annotations: <none>
Phase: CompletedNamespaces:
Included: *
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>Storage Location: defaultSnapshot PVs: autoTTL: 720h0m0sHooks: <none>Backup Format Version: 1Started: 2019-02-20 12:21:04 +0000 UTC
Completed: 2019-02-20 12:23:03 +0000 UTC
Expiration: 2019-03-22 12:21:04 +0000 UTCValidation errors: <none>Persistent Volumes: <none included>Restic Backups:
Completed:
wordpress/wordpress-7b6c4c79bb-4vnc6: wordpress-persistent-storage
wordpress/wordpress-mysql-6887bf844f-vndtp: mysql-persistent-storage

Scheduled Backups

Taking a backup manually happens only in an emergency situation or for educational purposes. The real essence of a backup and disaster recovery plan is to have scheduled backups. Ark provides that support in a rather simple manner.

$ velero schedule create daily-wordpress-backup--schedule="0 10 * * *" --include-namespaces wordpress
Schedule "daily-wordpress-backup" created successfully.

Simulate Disaster Scenario

Let’s “accidentally” delete the wordpress namespace.

$ kubectl delete namespace wordpress

Yikes !! My data :(

Restore Application and data

Restore in velero is pretty straightforward.

$ velero restore create --from-backup wp-backup 
Restore request "wp-backup-20190220062517" submitted successfully.
Run `velero restore describe wp-backup-20190220062517` or `velero restore logs wp-backup-20190220062517` for more details.

You can check the details of the restore by the running the above mentioned command [change the backup name]

In sometime you should be able to see the wordpress namespace is back and wordpress and mysql pods are running again.

To verify that your data wasn’t lost , fire up the wordpress ui ,you’ll see the posts and comments you had created earlier after deploying the application for the first time.

Troubleshooting

In case you are facing any issues regarding the setting up of the kubernetes cluster. Please make sure you have enough physical resources to spin up 3 VM’s. If not you can modify the Vagrantfile as mentioned in the README for the repository to increase/decrease the number of nodes.

For issues related to velero , there are a few commands that may be helpful

$ velero backup describe <backupName>
$ velero backup logs <backupName>
$ velero restore describe <restoreName>
$ velero restore logs <restoreName>

For comprehensive troubleshooting regarding velero, please follow this link.

Cleanup

If you don’t need the cluster anymore, you can go ahead and destroy the cluster

$ cd $HOME/ark-rook-tutorial/k8s-bkp-restore
$ vagrant destroy -f
$ rm -rf $HOME/ark-rook-tutorial

Takeaways

We learned that backups are important and a good disaster recovery strategy can help you get out of uncomfortable, gut wrenching “data-loss” situations.

Yes, there are limitations to what Velero can do and what it can’t do.

  • It does not support the migration of persistent volumes across cloud providers.
  • Velero + Restic currently supports backing up to only S3 compatible object storage.

Whats more to come ?

You will have notice that we are restoring our wordpress application in the same cluster, usually in the event of a disaster we have to assume that the old cluster no longer exists.

In the next post, we will talk about setting up a storage bucket on a cloud provider and use different clusters for backup and restore.

UPDATE — Second part of this article is published and can be read here

--

--

Responses (8)

Write a response