-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Dueling swarm managers when using ZK discovery #2654
Description
When using the ZooKeeper discovery backend, it is possible for more than one primary swarm manager to be elected in the cluster.
When the ZooKeeper session for the primary swarm manager is closed (due to session expiration for example), the docker/libkv implementation does not release it's lock on the leader node. However, with the session closed, the ephemeral node for the current primary will be removed, allowing other managers to acquire the lock.
For example, here are logs from the primary:
2017-03-10T01:29:30.419205Z 2017/03/10 01:29:30 Recv loop terminated: err=read tcp 192.168.200.3:41548->192.168.200.3:2181: i/o timeout
2017-03-10T01:29:45.502516Z 2017/03/10 01:29:30 Send loop terminated: err=<nil>
2017-03-10T01:29:46.261610Z 2017/03/10 01:29:32 Connected to 192.168.200.3:2181
2017-03-10T01:29:47.040309Z 2017/03/10 01:29:47 Authentication failed: zk: session has been expired by the server
2017-03-10T01:29:48.016925Z 2017/03/10 01:29:48 Connected to 192.168.200.3:2181
time="2017-03-10T01:29:48Z" level=error msg="Discovery error: Unexpected watch error"
time="2017-03-10T01:29:48Z" level=error msg="Leader Election: watch leader channel closed, the store may be unavailable..."
2017-03-10T01:29:49.958539Z 2017/03/10 01:29:48 Authentication failed: EOF
2017-03-10T01:29:49.964725Z 2017/03/10 01:29:49 Connected to 192.168.200.3:2181
2017-03-10T01:29:49.975560Z 2017/03/10 01:29:49 Authentication failed: EOF
2017-03-10T01:29:50.850046Z 2017/03/10 01:29:50 Connected to 192.168.200.3:2181
2017-03-10T01:29:50.874334Z 2017/03/10 01:29:50 Authentication failed: EOF
2017-03-10T01:29:51.855878Z 2017/03/10 01:29:51 Connected to 192.168.200.3:2181
2017-03-10T01:29:51.882071Z 2017/03/10 01:29:51 Authenticated: id=216172961576583171, timeout=10000
2017-03-10T01:29:51.884640Z 2017/03/10 01:29:51 Re-submitting `0` credentials after reconnect
At this point, the session has been closed and we never see the string "Leader Election: Cluster leadership lost" because this node still believes it holds onto the leader lock. However, next we see that another leader has been elected:
time="2017-03-10T01:29:58Z" level=info msg="New leader elected: 192.168.200.4:3375"
After this point things get 'interesting' as the two primary swarm managers fight over container rescheduling.
A PR for this bug has been submitted to the docker/libkv project: docker/libkv#156