Master not discovered after master restart in docker swarm

Hi, I’m dealing with a really strange issue. In that a got a 4 node (1 master 3 data) cluster working, they could all see each other no issues.

The problem came in when I decided to test how things would work through a full host restart (all hosts). config stayed the same, but suddenly crate can’t seem to find the master.
networking is fine, the hostnames are resolved and can be pinged. whats even stranger is that it DOES discover the master, but I think because it’s on a different Ip it suddenly doesn’t recognise anymore.

Node names:

  • crate-master-1
  • crate-data-1
  • crate-data-2
  • crate-data-3

[2022-09-23T12:03:07,705][WARN ][o.e.c.c.ClusterFormationFailureHelper] [crate-data-1] master not discovered yet: have discovered [{crate-data-1}{3OkXgv94Ty-_7XF_w8NFUg}{EnL7NCzhT0CZB-O-6U1RSA}{10.0.3.22}{10.0.3.22:4300}{http_address=10.0.3.22:4200}, {crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ}{10.0.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}]; discovery will continue using [10.0.3.21:4300] from hosts providers and [ ] from last-known cluster state; node term 0, last-accepted version 0 in term 0

as you can see in the log crate-data-1 DOES discover crate-master-1!

here is my swarm stack file

version: '3.8'

services:
  crate-master:
    image: crate:latest
    ports:
      - "4200:4200"
#      - "4300:4300"
#      - "5432:5432"
    environment:
      SLOT: "{{.Task.Slot}}"
    volumes:
      - crate-master:/data
    hostname: "crate-master-{{.Task.Slot}}"
    networks:
      - crate-net
    command: >
      crate
      -Ccluster.name=data-swarm
      -Cnode.name=crate-master-$${SLOT}
      -Cnode.master=true
      -Cnode.data=false
      -Cnetwork.publish_host=_eth0_
      -Cdiscovery.type='zen'
      -Cdiscovery.seed_hosts=tasks.crate-master
      -Cgateway.expected_data_nodes=4
      -Cgateway.recover_after_data_nodes=2
      -Ccluster.initial_master_nodes=crate-master-{{.Task.Slot}}
    deploy:
      replicas: 1

      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure

  crate-data:
    image: crate:latest
#    ports:
#     - "4200:4200"
#      - "4300:4300"
#      - "5432:5432"
    environment:
      SLOT: "{{.Task.Slot}}"
    volumes:
      - crate-data:/data
    hostname: "crate-data-{{.Task.Slot}}"
    networks:
      - crate-net
    command: >
      crate
      -Ccluster.name=data-swarm
      -Cnode.name=crate-data-$${SLOT}
      -Cnode.master=false
      -Cnode.data=true
      -Cnetwork.publish_host=_eth0_
      -Cdiscovery.type='zen'
      -Cdiscovery.seed_hosts=tasks.crate-master
      -Ccluster.initial_master_nodes=crate-master-1
    deploy:
      replicas: 3
      placement:
        max_replicas_per_node: 1
        constraints:
          - node.role == worker
      restart_policy:
        condition: on-failure

networks:
  crate-net:

volumes:
  crate-master:
    name: 'crate-master-{{.Task.Slot}}'
  crate-data:
    name: 'crate-data-{{.Task.Slot}}'

maybe even weirder, I’m getting this on the master node

crate-swarm_crate-master@xxxxx | [2022-09-23T12:38:44,104][WARN ][o.e.c.c.ClusterFormationFailureHelper] [crate-master-1] master not discovered or elected y
et, an election requires at least 2 nodes with ids from [rPbVptbIQbKi5NgEXxIp7w, Lnh9hCA0Td6KPRiLqYpr-A, iQzjuzPmRl6KCvq5SUtPYw], have discovered [{crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ
}{10.0.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}] which is not a quorum; discovery will continue using [ ] from hosts providers and [{crate-master-1}{rPbVptbIQbKi5NgEXxIp7w}{pCtCqGNXRtW_eHrJs35YzQ}{10.0
.3.21}{10.0.3.21:4300}{http_address=10.0.3.21:4200}] from last-known cluster state; node term 5, last-accepted version 19 in term 5

The problem is most likely hostname / ip related, as CrateDB does not use them for the master election.

Did you have more master-eligible nodes before in this cluster?

[rPbVptbIQbKi5NgEXxIp7w, Lnh9hCA0Td6KPRiLqYpr-A, iQzjuzPmRl6KCvq5SUtPYw]

would suggest, that there were at least 3 master-eligible nodes in the past.


Generally speaking you should have at least 3 master-eligible nodes in a multi-node cluster and typically it is fine to have them act as both master and data node (for smaller setups)

ok yeah that fixed it, either changing master replicas to 3.
or making the data nodes masters as well.

I also just tested:

  • increasing the replica count on masters after deployment
  • increasing the replica count on masters and removing master from data nodes after deployment

and both scenarios work, so i thin your suggestion will work well for a start, as long as there is 3 master nodes.

Just as a note, trying to go back
from 3 master + 3 data
to 1 master + 3 data&master

totally breaks it again.

That is possible, however not just by a full cluster restart.
The voting configuration automatically adjust (and is stored in the master state) depending on how many eligible mater nodes are in the cluster. Do it one by one and it should work.

Ok this is SOOOOO unreliable,
with 1 master 3 data&master nodes it sporadically brakes on redeployments when I make innocuous changes like

-Cgateway.expected_data_nodes=3
to
-Cgateway.expected_data_nodes=4

then it’s the same issue all over again, the only thing that works is to completely destroy all the docker volumes and redeploy.
But if this was production then i would lose all my data…

I suppose the question, or more like direction for my own homework, is how do I handle a situation where there is a full cluster restart?

The setup works that way especially to not lose data :slight_smile:
If there were 4 master-eligible (there is only one active mater at a time) nodes before and you try to restart the cluster with only 1, it should not start, as it can’t be guaranteed, that there is no data loss.

Generally speaking if you are not running more than 6-7 nodes or the cluster is not under heavy load anyway, then it is fine to have combined master/data nodes.

Also I would not bring the pure master nodes into a query rotation (i.e. no need to expose port 4200)


Generally speaking full cluster restarts are only necessary with major version upgrades, and maybe in the future not even for that anymore :wink: There are production clusters running for years, that only did rolling updates.

1 Like

The context here is that I am running on Jelastic and using Portainer to manage the swarm stack.

there could be reasons to redeploy the Jelastic containers (updating docker for example)
or redploying the stack from portainer which would be reading a docker-stack.yml file from a git repo. (updating a Jelaastic swarm deployment will basically redeploy every single docker manager and worker in one swoop.)

But its starting to feel like its really not suited to being this elastic.

But its starting to feel like its really not suited to being this elastic.

Still not entirely sure what you want to achieve :smiley:
A full cluster restart is possible of course.


Maybe also the helm chart: crate 0.2.0 · helm/crate
or the operator can be of some inspiration: https://github.com/crate/crate-operator/tree/master/crate

1 Like

@prodata thank you for all the input.
I think I’ll revert back to the master/data combined nodes for now and deploy them to the workers.

The goal is just to be able to safely and reliably manage crate with easy elasticity.

That means being able to scale up replicas quickly, and restarting without nodes treating each other as foreign agents. The “stop” services button and redeployment options on jelastic scare me here as any of the managed infrastructure team members could decide to do maintenance as they see fit, and they shouldn’t have to worry about crate suddenly refusing to function.

I already can atleast confirm on jelastic the named docker volumes are persistent so that isn’t an issue. But that’s probably also why the cluster ID sometimes changes or the node ids sometimes change through restarts (even though the node names and cluster names stay the same).

I guess It would be nice to be able to tell it to always trust the names if the ids do change or even manually allow it to accept that the ids have changed (which I haven’t figured out yet)