Vue normale

Make docker swarm HA with keepalived |・∀・

5 mars 2026 à 18:10

While having a self-healing, scalable docker swarm is great for availability and scalability, none of that is worth a sausage if nobody can connect to your cluster!

Preparation

Enable IPVS module

On all nodes which will participate in keepalived, we need the "ip_vs" kernel module, in order to permit services to bind to non-local interface addresses.

Set this up once-off for both the primary and secondary nodes, by running:

echo "modprobe ip_vs" >> /etc/modules
modprobe ip_vs

Setup nodes

Assuming your IPs are as per the following example:

192.168.4.1 : Primary
192.168.4.2 : Secondary
192.168.4.3 : Virtual

Run the following on the primary

docker run -d --name keepalived --restart=always \
--cap-add=NET_ADMIN --cap-add=NET_BROADCAST --cap-add=NET_RAW --net=host \
-e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['192.168.4.1', '192.168.4.2']" \
-e KEEPALIVED_VIRTUAL_IPS=192.168.4.3 \
-e KEEPALIVED_PRIORITY=200 \
osixia/keepalived:2.0.20

And on the secondary2:

docker run -d --name keepalived --restart=always \
--cap-add=NET_ADMIN --cap-add=NET_BROADCAST --cap-add=NET_RAW --net=host \
-e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['192.168.4.1', '192.168.4.2']" \
-e KEEPALIVED_VIRTUAL_IPS=192.168.4.3 \
-e KEEPALIVED_PRIORITY=100 \
osixia/keepalived:2.0.20

Serving

That's it. Each node will talk to the other via unicast (no need to un-firewall multicast addresses), and the node with the highest priority gets to be the master. When ingress traffic arrives on the master node via the VIP, docker's routing mesh will deliver it to the appropriate docker node.
Summary

What have we achieved?

Summary

Created:

A Virtual IP to which all cluster traffic can be forwarded externally, making it "Highly Available"

The easy, 5-minute install

I share (with sponsors and patrons) a private "premix" GitHub repository, which includes an ansible playbook for deploying the entire Geek's Cookbook stack, automatically. This means that members can create the entire environment with just a git pull and an ansible-playbook deploy.yml 👍

Chef's notes 📓

Some hosting platforms (OpenStack, for one) won't allow you to simply "claim" a virtual IP. Each node is only able to receive traffic targetted to its unique IP, unless certain security controls are disabled by the cloud administrator. In this case, keepalived is not the right solution, and a platform-specific load-balancing solution should be used. In OpenStack, this is Neutron's "Load Balancer As A Service" (LBAAS) component. AWS, GCP and Azure would likely include similar protections. ↩

More than 2 nodes can participate in keepalived. Simply ensure that each node has the appropriate priority set, and the node with the highest priority will become the master.

Direct link
❌