Master not discovered after master restart in docker swarm

rijnhard · September 23, 2022, 12:35pm

Hi, I’m dealing with a really strange issue. In that a got a 4 node (1 master 3 data) cluster working, they could all see each other no issues.

The problem came in when I decided to test how things would work through a full host restart (all hosts). config stayed the same, but suddenly crate can’t seem to find the master.
networking is fine, the hostnames are resolved and can be pinged. whats even stranger is that it DOES discover the master, but I think because it’s on a different Ip it suddenly doesn’t recognise anymore.

Node names:

crate-master-1
crate-data-1
crate-data-2
crate-data-3

[2022-09-23T12:03:07,705][WARN ][o.e.c.c.ClusterFormationFailureHelper] [crate-data-1] master not discovered yet: have discovered [{crate-data-1}{3OkXgv94Ty-_7XF_w8NFUg}{EnL7NCzhT0CZB-O-6U1RSA}{10.0.3.22}{10.0.3.22:4300}{http_address=10.0.3.22:4200}, {crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ}{10.0.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}]; discovery will continue using [10.0.3.21:4300] from hosts providers and [ ] from last-known cluster state; node term 0, last-accepted version 0 in term 0

as you can see in the log crate-data-1 DOES discover crate-master-1!

here is my swarm stack file

version: '3.8'

services:
  crate-master:
    image: crate:latest
    ports:
      - "4200:4200"
#      - "4300:4300"
#      - "5432:5432"
    environment:
      SLOT: "{{.Task.Slot}}"
    volumes:
      - crate-master:/data
    hostname: "crate-master-{{.Task.Slot}}"
    networks:
      - crate-net
    command: >
      crate
      -Ccluster.name=data-swarm
      -Cnode.name=crate-master-$${SLOT}
      -Cnode.master=true
      -Cnode.data=false
      -Cnetwork.publish_host=_eth0_
      -Cdiscovery.type='zen'
      -Cdiscovery.seed_hosts=tasks.crate-master
      -Cgateway.expected_data_nodes=4
      -Cgateway.recover_after_data_nodes=2
      -Ccluster.initial_master_nodes=crate-master-{{.Task.Slot}}
    deploy:
      replicas: 1

      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure

  crate-data:
    image: crate:latest
#    ports:
#     - "4200:4200"
#      - "4300:4300"
#      - "5432:5432"
    environment:
      SLOT: "{{.Task.Slot}}"
    volumes:
      - crate-data:/data
    hostname: "crate-data-{{.Task.Slot}}"
    networks:
      - crate-net
    command: >
      crate
      -Ccluster.name=data-swarm
      -Cnode.name=crate-data-$${SLOT}
      -Cnode.master=false
      -Cnode.data=true
      -Cnetwork.publish_host=_eth0_
      -Cdiscovery.type='zen'
      -Cdiscovery.seed_hosts=tasks.crate-master
      -Ccluster.initial_master_nodes=crate-master-1
    deploy:
      replicas: 3
      placement:
        max_replicas_per_node: 1
        constraints:
          - node.role == worker
      restart_policy:
        condition: on-failure

networks:
  crate-net:

volumes:
  crate-master:
    name: 'crate-master-{{.Task.Slot}}'
  crate-data:
    name: 'crate-data-{{.Task.Slot}}'

rijnhard · September 23, 2022, 12:39pm

maybe even weirder, I’m getting this on the master node

crate-swarm_crate-master@xxxxx | [2022-09-23T12:38:44,104][WARN ][o.e.c.c.ClusterFormationFailureHelper] [crate-master-1] master not discovered or elected y
et, an election requires at least 2 nodes with ids from [rPbVptbIQbKi5NgEXxIp7w, Lnh9hCA0Td6KPRiLqYpr-A, iQzjuzPmRl6KCvq5SUtPYw], have discovered [{crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ
}{10.0.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}] which is not a quorum; discovery will continue using [ ] from hosts providers and [{crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ}{10.0
.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}] from last-known cluster state; node term 5, last-accepted version 19 in term 5

proddata · September 23, 2022, 12:55pm

The problem is most likely hostname / ip related, as CrateDB does not use them for the master election.

Did you have more master-eligible nodes before in this cluster?

[rPbVptbIQbKi5NgEXxIp7w, Lnh9hCA0Td6KPRiLqYpr-A, iQzjuzPmRl6KCvq5SUtPYw]

would suggest, that there were at least 3 master-eligible nodes in the past.

Generally speaking you should have at least 3 master-eligible nodes in a multi-node cluster and typically it is fine to have them act as both master and data node (for smaller setups)

rijnhard · September 23, 2022, 1:06pm

ok yeah that fixed it, either changing master replicas to 3.
or making the data nodes masters as well.

I also just tested:

increasing the replica count on masters after deployment
increasing the replica count on masters and removing master from data nodes after deployment

and both scenarios work, so i thin your suggestion will work well for a start, as long as there is 3 master nodes.

Just as a note, trying to go back
from 3 master + 3 data
to 1 master + 3 data&master

totally breaks it again.

proddata · September 23, 2022, 1:24pm

That is possible, however not just by a full cluster restart.
The voting configuration automatically adjust (and is stored in the master state) depending on how many eligible mater nodes are in the cluster. Do it one by one and it should work.

rijnhard · September 23, 2022, 1:25pm

Ok this is SOOOOO unreliable,
with 1 master 3 data&master nodes it sporadically brakes on redeployments when I make innocuous changes like

-Cgateway.expected_data_nodes=3
to
-Cgateway.expected_data_nodes=4

then it’s the same issue all over again, the only thing that works is to completely destroy all the docker volumes and redeploy.
But if this was production then i would lose all my data…

rijnhard · September 23, 2022, 1:31pm

I suppose the question, or more like direction for my own homework, is how do I handle a situation where there is a full cluster restart?

proddata · September 23, 2022, 1:32pm

The setup works that way especially to not lose data
If there were 4 master-eligible (there is only one active mater at a time) nodes before and you try to restart the cluster with only 1, it should not start, as it can’t be guaranteed, that there is no data loss.

Generally speaking if you are not running more than 6-7 nodes or the cluster is not under heavy load anyway, then it is fine to have combined master/data nodes.

Also I would not bring the pure master nodes into a query rotation (i.e. no need to expose port 4200)

Generally speaking full cluster restarts are only necessary with major version upgrades, and maybe in the future not even for that anymore There are production clusters running for years, that only did rolling updates.

rijnhard · September 23, 2022, 1:40pm

The context here is that I am running on Jelastic and using Portainer to manage the swarm stack.

there could be reasons to redeploy the Jelastic containers (updating docker for example)
or redploying the stack from portainer which would be reading a docker-stack.yml file from a git repo. (updating a Jelaastic swarm deployment will basically redeploy every single docker manager and worker in one swoop.)

But its starting to feel like its really not suited to being this elastic.

proddata · September 23, 2022, 1:53pm

But its starting to feel like its really not suited to being this elastic.

Still not entirely sure what you want to achieve
A full cluster restart is possible of course.

Maybe also the helm chart: crate 0.2.0 · helm/crate
or the operator can be of some inspiration: https://github.com/crate/crate-operator/tree/master/crate

rijnhard · September 23, 2022, 2:29pm

@prodata thank you for all the input.
I think I’ll revert back to the master/data combined nodes for now and deploy them to the workers.

The goal is just to be able to safely and reliably manage crate with easy elasticity.

That means being able to scale up replicas quickly, and restarting without nodes treating each other as foreign agents. The “stop” services button and redeployment options on jelastic scare me here as any of the managed infrastructure team members could decide to do maintenance as they see fit, and they shouldn’t have to worry about crate suddenly refusing to function.

I already can atleast confirm on jelastic the named docker volumes are persistent so that isn’t an issue. But that’s probably also why the cluster ID sometimes changes or the node ids sometimes change through restarts (even though the node names and cluster names stay the same).

I guess It would be nice to be able to tell it to always trust the names if the ids do change or even manually allow it to accept that the ids have changed (which I haven’t figured out yet)

Topic		Replies	Views
CrateDB, could not discover another node CrateDB	1	656	May 29, 2020
CrateDB does not select a leader CrateDB getting-started	5	251	October 6, 2023
Docker Swarm cluster CrateDB	8	1616	July 26, 2021
Cluster IP addresses change CrateDB	6	559	April 1, 2022
3 nodes cluster suddenly failing CrateDB	10	984	July 22, 2021

Master not discovered after master restart in docker swarm

Related Topics