A Guide to Disaster Recovery in the Kubernetes Cluster

So far, we have seen various topics about Kubernetes, in this article let’s see another important topic about “A Guide to Disaster recovery in the Kubernetes cluster”. As the usage of the Kubernetes is increasing across everywhere, it is important to consider the industry standard processes part of your cluster implementation/configuration. Part of that backup is the one helps to recover your Kubernetes cluster from any major failure.

Why we need Backup and Recovery?

  1. To recover cluster from Disasters: like someone accidentally deleted the namespace where your deployments reside.
  2. Replicate the environment: You want to replicate your production environment to staging environment before any major upgrade.
  3. Migration of Kubernetes Cluster: Let’s say, you want to migrate your Kubernetes cluster from one environment to another.

What to Backup?

  1. Your Kubernetes control plane is stored into etcd storage and you need to backup the etcd state to get all the Kubernetes resources.
  2. If you have stateful containers (which you will have in real world), you need a backup of persistent volumes as well.

Best practices for Kubernetes disaster recovery

  • Understand the backup requirement

It is important to understand what to take backup and how it is important. Like if you are running your kubernetes cluster on any cloud environment with GitOps, then backup is less bothered, as all your changes will be on GIT, you can focus on taking backup of volumes if you are using any. Like that understand the backup requirement and plan for it. In this article we assumed you are running the cluster on bare-metal and provided one of possible way to take backup. If you wish to adopt the GitOps way you can follow our ArgoCD series.

  • Have a restore plan

You should have details steps and plan how to restore the backup incase if anything happened, always test it minimum twice in different environments, so you will be more confident on real-time. Keep the steps with detailed explanation so it can be performed by anyone quickly.

  • Application-aware backups

Kubernetes’ portability is a double-edged sword. While it makes it easy to build new applications using existing services and helps ease migration to different environments. As many workloads running on the k8s platform are stateless, it’s important to have application-aware backups that provide context to the backup and different components involved in it. This can be done with the help of a Kubernetes backup solution. Organizations can automate the entire backup and recovery process to avoid any failures. These solutions also provide options to deploy the backups in various locations and help to make restoring to a brand-new environment a breeze.

  • Security is key

We need to protect our backups from any attackers. Organizations can make the mistake of slacking on the backup security. However, your application is as secure as your backup. To avoid unwarranted access to backups, organizations should employ identity access management (IAM) or role-based access control (RBAC). Only the members who are assigned to monitor or verify backups should be given access rights. Another important measure that can be taken to curb any attacks is data encryption. Organizations can invest in a disaster recovery solution that takes care of backup security for them.

Requirement

  1. You can you https://foxutech.com/setup-a-multi-master-kubernetes-cluster-with-kubeadm/ to setup the cluster on your environment.

ETCD Backup

How to Take etcd backup:

  1. Internal etcd cluster: It means you’re running your etcd cluster in the form of containers/pods inside the Kubernetes cluster and it is the responsibility of Kubernetes to manage those pods.
  2. External etcd cluster: Etcd cluster you’re running outside of Kubernetes cluster mostly in the form of Linux services and providing its endpoints to Kubernetes cluster to write to.

Backup Strategy for Internal Etcd Cluster:

Note: The backup location should be external or somewhere secure location, which is again backup properly or high available environment like cloud volumes.

Command:

# etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/client.crt --key=/etc/kubernetes/pki/etcd/client.key snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db

If you are not aware of the etcd details, you can find the required information by using below command.

# kubectl get pods etcd-k8s-master -n kube-system -o=jsonpath='{.spec.containers[0].command}' | jq

Cronjob:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: backup
namespace: kube-system
spec:
# activeDeadlineSeconds: 100
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
# Same image as in /etc/kubernetes/manifests/etcd.yaml
image: k8s.gcr.io/etcd:3.2.24
env:
- name: ETCDCTL_API
value: "3"
command: ["/bin/sh"]
args: ["-c", "etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/client.crt --key=/etc/kubernetes/pki/etcd/client.key snapshot save /backup/etcd-snapshot-$(date +%Y-%m-%d_%H:%M:%S_%Z).db"]
volumeMounts:
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
readOnly: true
- mountPath: /backup
name: backup
restartPolicy: OnFailure
hostNetwork: true
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
- name: backup
hostPath:
path: /data/backup
type: DirectoryOrCreate

We can check the snapshot status.

# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /backup/etcd-snapshot.db

Continue reading on: https://foxutech.com/disaster-recovery-in-the-kubernetes-cluster/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
FoxuTech

Discuss about #Linux, #DevOps, #Docker, #kubernetes, #HowTo’s, #cloud & IT technologies like #argocd #crossplane #azure https://foxutech.com/