Backup and Restore Kubernetes Etcd on the Same Control Plane Node

4 min readApr 18, 2021

--

Backup and Restore Kubernetes Etcd on the Same Control Plane Node

Image: https://etcd.io

Couple days ago I wrote an article about How to Change Kubernetes Kube-apiserver IP Address which involves keeping the original etcd data.

In this article I will go through the process of backing up and restroing etcd. Before we start, let’s do some basic understand of etcd first. From official, etcd is

a distributed, reliable key-value store for the most critical data of a distributed system.

Also,

etcd is open source, available on GitHub, and backed by the Cloud Native Computing Foundation.

OK, let’s get started with today’s topic. Please do note,

All commands are executed on control plane node directly.

Backup K8S Cluster

Assuming you already have a K8S cluster running workloads as below,

The skeleton of the backup command is as below.

ETCDCTL_API=3 etcdctl snapshot save <backup-file-location> \
--endpoints=https://127.0.0.1:2379 \
--cacert=<trusted-ca-file> \
--cert=<cert-file> \
--key=<key-file>

You can find required information from etcd.yaml or simple get etcd pod

kubectl get pods etcd-cp-1 -n kube-system \
-o=jsonpath='{.spec.containers[0].command}' | jq

So the command will be

ETCDCTL_API=3 etcdctl snapshot save /tmp/etcdBackup.db \
--endpoints=https://172.31.36.72:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

We can verify by the command

ETCDCTL_API=3 \
etcdctl --write-out=table snapshot status /tmp/etcdBackup.db

And that’s it! We have our K8S etcd backup! Next, let’s restore it back to the cluster.

Restore K8S Cluster

OK, restoring is a little tricky. I CANNOT find any detailed process of restoring etcd but only pieces of information scatter everywhere. Base on Kubernetes and etcd official website, I came out with following procedure for restoring.

Disclaimer: Following procedure IS NOT guaranteed as best practice, please DO NOT perform on production environment; use at your own risk.

Steps:

Use etcdctl command to restore the backup data
Stop all api server instance on cluster
Replace current etcd data with restored etcd data
Restart necessary K8S instances
Verify result

Let’s delete all workloads on the cluster so to compare the final result.

1. Restore etcd data

Restore command is almost the same as backup command. Change save to restore and ommit --endpoints flag. To make the command short, we can do as below

export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_API=3

Restore etcd with command below

etcdctl snapshot restore /tmp/etcdBackup.db

The restore command will generate a folder called default.etcd which contains a folder called member, which is the etcd data we have backup.

2. Stop Api server instance

According to Kubernetes document, it is recommended to do the following

I had problem stopping API server instances but found solution here. Status of api-server, scheduler and controller.

Stop all of them. Once stop, kubectl command is no longer functional cause api-server is stopped.

3. Replace etcd data

Current etcd data is located at /var/lib/etcd/member, restored etcd data is located at /root/default.etcd/member in my case. Let’s swap the data.