Learn to back up and restore an etcd cluster in Kubernetes, including verification, snapshot creation, and restoration procedures.
In this lesson, you will learn how to back up and restore an etcd cluster running on a Kubernetes control plane. We begin by verifying the existing deployments, inspecting the etcd container setup, and then proceed with the backup and restore procedures.
To identify the version of etcd and verify pod details, locate the etcd pod in the kube-system namespace. Typically configured as a static pod, you can review its description to inspect container details and command-line parameters.For example, examining the etcd container shows:
From the information provided, the etcd version is 3.5.1 running on port 2379 (loopback address). The container’s command-line options also specify critical certificate file locations:
Server certificate: /etc/kubernetes/pki/etcd/server.crt
CA certificate: /etc/kubernetes/pki/etcd/ca.crt
A similar configuration is confirmed in the second block:
When maintenance, such as a master node reboot, is scheduled, it is best practice to back up the etcd data. Using the built-in snapshot functionality ensures that your Kubernetes cluster data can be restored if needed.Review the static pod definition fragment in /etc/kubernetes/manifests/etcd.yaml. Notice the configuration for volume mounts and probes:
Before initiating maintenance, take an etcd snapshot for backup. Ensure that the etcdctl API version is set to 3. The snapshot command requires specifying the endpoint along with the proper certificates.Run the following command to create a snapshot:
Copy
Ask AI
root@controlplane ~ ✗ etcdctl snapshot save --endpoints=127.0.0.1:2379 \--cacert=/etc/kubernetes/pki/etcd/ca.crt \--cert=/etc/kubernetes/pki/etcd/server.crt \--key=/etc/kubernetes/pki/etcd/server.key \/opt/snapshot-pre-boot.dbSnapshot saved at /opt/snapshot-pre-boot.dbroot@controlplane ~ ✗
Verify the backup has been saved:
Copy
Ask AI
root@controlplane ~ ➜ ls /opt/snapshot-pre-boot.db/opt/snapshot-pre-boot.dbroot@controlplane ~ ➜
After the maintenance (for example, following a reboot), you may notice that the cluster applications are inaccessible. Confirm the current state by checking the deployments, pods, and services:
Copy
Ask AI
root@controlplane ~ ➜ kubectl get deployNo resources found in default namespace.root@controlplane ~ ➜ k get podsNo resources found in default namespace.root@controlplane ~ ➜ k get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEkubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20sroot@controlplane ~ ➜
Since deployments and services are missing, you need to restore the etcd data from your backup.
To restore etcd from the snapshot, use the etcdctl snapshot restore command. This process does not require contacting an active etcd endpoint; it restores the data from the backup file into a new data directory.For example, run:
Copy
Ask AI
root@controlplane ~ # etcdctl snapshot restore --data-dir /var/lib/etcd-from-backup /opt/snapshot-pre-boot.db2022-04-17 20:54:34.669639 I | mvcc: restore compact to 23772022-04-17 20:54:34.676896 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
Confirm that the new directory (/var/lib/etcd-from-backup) contains the restored data.
After restoring the etcd data, update the etcd static pod manifest (located at /etc/kubernetes/manifests/etcd.yaml) to point to the new data directory. Modify the hostPath for the etcd-data volume from /var/lib/etcd to /var/lib/etcd-from-backup.The updated volume section should look like:
Once you update and save the changes, the kubelet will automatically restart the etcd pod with the updated manifest configuration. Monitor the pod’s status to ensure it transitions to the Running state:
If the new etcd pod fails to reach the Running state, such as due to failed liveness or startup probes, consider manually deleting the pod to force a restart.
Finally, verify the restored state of the cluster by checking that deployments, pods, and services are now available:
Copy
Ask AI
root@controlplane ~ ⟶ kubectl get deployroot@controlplane ~ ⟶ k get pods -n kube-systemNAME READY STATUS RESTARTS AGEetcd-controlplane 1/1 Running 0 14s...
With the etcd backup successfully restored and the static pod manifest updated, your Kubernetes control plane resumes normal operations and the applications become accessible again. This completes the backup and restore process for etcd in a Kubernetes cluster.