Setting Up the Swarm Cluster
In this demo, three servers are used. One server acts as the Docker master, while the other two serve as Docker nodes. Initially, each server only has Docker installed, differing only by their hostnames. To establish the swarm cluster, identify the Docker master among the three servers and initialize the swarm. Running the following command without an advertised address causes an error because the Docker host has multiple network interfaces:--advertise-addr flag. In this example, we choose to advertise the IP address 192.168.56.101:
docker node ls on the master, only the master node appears (marked as active and the leader), since no additional nodes have been added yet.
Adding Worker Nodes
To add worker nodes, follow these steps:- Copy the generated
docker swarm joincommand. - Execute it on each worker server.
docker node ls displays three nodes in total: one master and two workers.
Removing a Node from the Cluster
If you need to remove a node, execute the following on the node you wish to leave the swarm:Keep in mind that there may be a short delay before the node’s status changes from “Ready” to “Down” in the swarm listing. To completely remove a node that remains in the list, run the following command from the master node:
docker node ls confirms that the node has been deleted from the swarm.
Adding Manager Nodes
A Docker Swarm cluster can support multiple manager nodes, which significantly improves high availability. To add a new manager:- Change the hostname of the node if necessary.
- Retrieve the manager join token by running:
docker-master2):
docker node ls will show two managers: one labeled as Leader and the other as Reachable. Worker nodes will have no manager status listed.
To add an additional worker node, retrieve the worker join token with the following command:
Promoting a Worker to a Manager
At times, you might need to promote an existing worker node to a manager. To do so, run the promotion command from an active manager node:docker node ls confirms that docker-node2 now shows a manager status—typically “Reachable” or “Leader”—while the remaining worker nodes continue to display a blank manager status.
Simulating Node Failures and Managing Quorum
An important aspect of managing a Docker Swarm cluster is handling node failures. To simulate a manager node failure, shut down one of the manager nodes. For example, shutting downdocker-master3 will result in its status changing to “Unreachable,” as shown below:
A Docker Swarm cluster with three managers requires a majority (at least two) to maintain quorum. If too many managers go offline, you might encounter error messages when executing management commands:
Recovering from a Manager Failure
If a manager failure causes the loss of quorum and it is not possible to bring all managers back online, you can force-create a new swarm cluster. Use the--force-new-cluster flag along with the --advertise-addr parameter:
docker node ls will display the master node as the only manager. You can then rejoin the worker nodes to the new cluster. Note that previously registered manager nodes will lose their status in the process.
Recap
In this guide, we covered the following key points:| Task | Description | Command/Example |
|---|---|---|
| Initialize Docker Swarm | Start a swarm on the designated master node with the correct advertised IP address. | docker swarm init --advertise-addr 192.168.56.101 |
| Add Worker Node | Use the provided worker token to join a new worker node to the swarm. | docker swarm join --token <worker-token> 192.168.56.101:2377 |
| Remove Node | Remove a node from the swarm gracefully and then force-remove it from the master listing. | docker swarm leave and docker node rm <node> |
| Add Manager Node | Retrieve the manager join token and execute the join command on the new manager node. | docker swarm join-token manager |
| Promote Worker | Promote an existing worker node to a manager to expand the managerial capacity. | docker node promote <worker-node> |
| Simulate and Manage Failures | Observe how the cluster behaves when manager nodes go down and manage quorum issues. | Shutdown a manager to simulate failure and run docker node ls |
| Force New Cluster Recovery | Recover from a manager failure and lost quorum by forcing a new cluster formation. | docker swarm init --advertise-addr 192.168.56.101 --force-new-cluster |