Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Image: https://etcd.io

Couple days ago I wrote an article about How to Change Kubernetes Kube-apiserver IP Address which involves keeping the original etcd data.

In this article I will go through the process of backing up and restroing etcd. Before we start, let’s do some basic understand of etcd first. From official, etcd is

a distributed, reliable key-value store for the most critical data of a distributed system.

Also,

etcd is open source, available on GitHub, and backed by the Cloud Native Computing Foundation.

OK, let’s get started with today’s topic. Please do note,

All commands are executed on control plane node directly.

Backup K8S Cluster

Assuming you already have a K8S cluster running workloads as below,

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

The skeleton of the backup command is as below.

ETCDCTL_API=3 etcdctl snapshot save <backup-file-location> \
--endpoints=https://127.0.0.1:2379 \
--cacert=<trusted-ca-file> \
--cert=<cert-file> \
--key=<key-file>

You can find required information from etcd.yaml or simple get etcd pod

kubectl get pods etcd-cp-1 -n kube-system \
-o=jsonpath='{.spec.containers[0].command}' | jq
Backup and Restore Kubernetes Etcd on the Same Control Plane Node

So the command will be

ETCDCTL_API=3 etcdctl snapshot save /tmp/etcdBackup.db \
--endpoints=https://172.31.36.72:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Backup and Restore Kubernetes Etcd on the Same Control Plane Node

We can verify by the command

ETCDCTL_API=3 \
etcdctl --write-out=table snapshot status /tmp/etcdBackup.db
Backup and Restore Kubernetes Etcd on the Same Control Plane Node

And that’s it! We have our K8S etcd backup! Next, let’s restore it back to the cluster.

Restore K8S Cluster

OK, restoring is a little tricky. I CANNOT find any detailed process of restoring etcd but only pieces of information scatter everywhere. Base on Kubernetes and etcd official website, I came out with following procedure for restoring.

Disclaimer: Following procedure IS NOT guaranteed as best practice, please DO NOT perform on production environment; use at your own risk.

Steps:

  1. Use etcdctl command to restore the backup data
  2. Stop all api server instance on cluster
  3. Replace current etcd data with restored etcd data
  4. Restart necessary K8S instances
  5. Verify result

Let’s delete all workloads on the cluster so to compare the final result.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Restore command is almost the same as backup command. Change save to restore and ommit --endpoints flag. To make the command short, we can do as below

export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_API=3

Restore etcd with command below

etcdctl snapshot restore /tmp/etcdBackup.db
Backup and Restore Kubernetes Etcd on the Same Control Plane Node

The restore command will generate a folder called default.etcd which contains a folder called member, which is the etcd data we have backup.

According to Kubernetes document, it is recommended to do the following

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

I had problem stopping API server instances but found solution here. Status of api-server, scheduler and controller.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Stop all of them. Once stop, kubectl command is no longer functional cause api-server is stopped.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Current etcd data is located at /var/lib/etcd/member, restored etcd data is located at /root/default.etcd/member in my case. Let’s swap the data.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Let’s move all yaml files back to where it should be follow by restarting docker service. Make sure docker started correctly.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

List all out pods, we can see that all the deleted nginx deployment, replicaset and pods are back again and all the K8S components are running too!

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Lastly, let’s do some provision to double check K8S’s functionality.

Restart docker service again if K8S cluster is not working correctly, it might help.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Test the cluster with scale out functionality.

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

And that’s it, everything seems working correctly! Hope you enjoy it : )

How can I stop Kubernetes control plane pods?
Backing up an etcd cluster
Disaster recovery

AWS Certified SA, SysOps & Developer Associate, Alibaba Cloud certified SA. Focusing on Azure, Prometheus w/ Grafana, ELK and K8S now.