Backup and Restore Kubernetes Etcd on the Same Control Plane Node
Image: https://etcd.io
Couple days ago I wrote an article about How to Change Kubernetes Kube-apiserver IP Address which involves keeping the original etcd data.
In this article I will go through the process of backing up and restroing etcd. Before we start, let’s do some basic understand of etcd first. From official, etcd is
a distributed, reliable key-value store for the most critical data of a distributed system.
Also,
etcd is open source, available on GitHub, and backed by the Cloud Native Computing Foundation.
OK, let’s get started with today’s topic. Please do note,
All commands are executed on control plane node directly.
Backup K8S Cluster
Assuming you already have a K8S cluster running workloads as below,
The skeleton of the backup command is as below.
ETCDCTL_API=3 etcdctl snapshot save <backup-file-location> \
--endpoints=https://127.0.0.1:2379 \
--cacert=<trusted-ca-file> \
--cert=<cert-file> \
--key=<key-file>
You can find required information from etcd.yaml or simple get etcd pod
kubectl get pods etcd-cp-1 -n kube-system \
-o=jsonpath='{.spec.containers[0].command}' | jq
So the command will be
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcdBackup.db \
--endpoints=https://172.31.36.72:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
We can verify by the command
ETCDCTL_API=3 \
etcdctl --write-out=table snapshot status /tmp/etcdBackup.db
And that’s it! We have our K8S etcd backup! Next, let’s restore it back to the cluster.
Restore K8S Cluster
OK, restoring is a little tricky. I CANNOT find any detailed process of restoring etcd but only pieces of information scatter everywhere. Base on Kubernetes and etcd official website, I came out with following procedure for restoring.
Disclaimer: Following procedure IS NOT guaranteed as best practice, please DO NOT perform on production environment; use at your own risk.
Steps:
- Use etcdctl command to restore the backup data
- Stop all api server instance on cluster
- Replace current etcd data with restored etcd data
- Restart necessary K8S instances
- Verify result
Let’s delete all workloads on the cluster so to compare the final result.
1. Restore etcd data
Restore command is almost the same as backup command. Change save to restore and ommit --endpoints flag. To make the command short, we can do as below
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_API=3
Restore etcd with command below
etcdctl snapshot restore /tmp/etcdBackup.db
The restore command will generate a folder called default.etcd which contains a folder called member, which is the etcd data we have backup.
2. Stop Api server instance
According to Kubernetes document, it is recommended to do the following
I had problem stopping API server instances but found solution here. Status of api-server, scheduler and controller.
Stop all of them. Once stop, kubectl command is no longer functional cause api-server is stopped.
3. Replace etcd data
Current etcd data is located at /var/lib/etcd/member, restored etcd data is located at /root/default.etcd/member in my case. Let’s swap the data.
4. Restart necessary K8S instances
Let’s move all yaml files back to where it should be follow by restarting docker service. Make sure docker started correctly.
5. Verify restoring restore
List all out pods, we can see that all the deleted nginx deployment, replicaset and pods are back again and all the K8S components are running too!
Lastly, let’s do some provision to double check K8S’s functionality.
Restart docker service again if K8S cluster is not working correctly, it might help.
Test the cluster with scale out functionality.
And that’s it, everything seems working correctly! Hope you enjoy it : )
Reference:
How can I stop Kubernetes control plane pods?
Backing up an etcd cluster
Disaster recovery