How to properly automate autoscaling a cluster?

Hi,

Right now I’m on v3.3.5, a three-node cluster in AWS EC2, and I want to upgrade to v4.2.1.
In our system, everything is automated and we have autoscaling in place. On a button click we have three (or more) new EC2 instances built from scratch, Crate.IO CE installed on them, and they discover each other just fine.

Now, in v4.2.1 the new initial_master_nodes setting has been introduced and is required, and represents a problem for my current automation setup.
The problem is, EC2 instances are starting one-by-one, which means I don’t know all three IPs of my nodes.

So, what is the recommended way for scaling nodes that want to join a cluster?
Is there any way to avoid having to “change” the configuration on the first node, after the second node has been started?

The solution with v3.3.5 right now works very elegantly and nodes can be terminated or recreated at will, and they “just” join the cluster.
It seems somewhat cumbersome if I have to change yml config files on all nodes everytime I want to add a new node. I love the fact that I don’t have to think about my cluster, and which nodes are master and which are not. Is there a way to get there with v4.2.1?

you only have to set them for the initial cluster and the first master election. I.e. if you start with a 3 node cluster all those node have to have the same list of initial master nodes. Nodes joining after that, don’t need to have them set.

Further cluster.initial_master_nodes can contain

node names, full-qualified hostnames or IP addresses

Hi @proddata,

Thanks for the reply.

What if I’m starting one by one, so I first need to get to my three instances to begin with, right?

Actually it is sufficient to set cluster.initial_master_nodes on one node, however it is recommend to set it to the same list for all initial nodes (that should be master eligible)

Hi @proddata,

Just to make sure I understand you correctly, I could have a script that would run on server startup/setup, and it would check for other instances in the same security group (subnet), and I’d set only those IPs that are available at that moment, and I’d still get a correct cluster? I.e. my configs would look like this:

Server 1

  • cluster.initial_master_nodes:
    • server1

Server 2

  • cluster.initial_master_nodes:
    • server1
    • server2

Server 3

  • cluster.initial_master_nodes:
    • server1
    • server2
    • server3

So, if only one node has all three servers defined as initial master nodes, all nodes in the cluster would recognize those servers, even if they are not listed in their particular config yml, right?

Would the discovery also work if each node has only his own IP listed in the yml as the initial master node? That would simplify things a lot from my perspective.

Thanks

No, unfortunately not.

  • You either have to set cluster.intial_master_nodes to the same value (i.e. 1 or more nodes) for all nodes or not at all.
  • But at least one node/server needs to have set cluster.intial_master_nodes to one or more eligible master node(s)

In order to have a scalable cluster, you would need to start with 3 nodes anyway. Therefore you could set a known nodename (e.g. node01) and use them within the cluster.intial_master_nodes setting.


Variant 1 (works)
Server 1

  • cluster.initial_master_nodes:
    • server1 (ip, host or node name)

Server 2

  • cluster.initial_master_nodes - not set

Server 3

  • cluster.initial_master_nodes - not set

Variant 2 (works)
Server 1

  • cluster.initial_master_nodes:
    • server1 (ip, host or node name)
    • server2 (ip, host or node name)
    • server3 (ip, host or node name)

Server 2

  • cluster.initial_master_nodes:
    • server1 (ip, host or node name)
    • server2 (ip, host or node name)
    • server3 (ip, host or node name)

Server 3

  • cluster.initial_master_nodes:
    • server1 (ip, host or node name)
    • server2 (ip, host or node name)
    • server3 (ip, host or node name)

What if, at some point in time, the node01 becomes corrupt and I have to replace it?
If I remove that node01, and add another node0x, which also has the initial_master_nodes “not set”, does that mean that my cluster will continue to operate and work correctly even though at that point not one node has the initial_master_nodes set?

In any case, yes, unfortunately for me your Variant 1 and 2 are working (of course I have to delete the “data” directory when restarting nodes).
That means I still have the problem where I have to build a custom script to decide “am I the first node in the cluster”, or the other way a script that would be executed each time I add a new node which then would “go and update configs of existing nodes”.

Or am I missing something?

Thanks

What if, at some point in time, the node01 becomes corrupt and I have to replace it?

well … it really should’t go corrupt :slight_smile: but if it does it really is not a problem, as the intial master election has been done before.

does that mean that my cluster will continue to operate and work correctly even though at that point not one node has the initial_master_nodes set?

yes

(of course I have to delete the “data” directory when restarting nodes)

this shouldn’t be necessary, if the nodes are started with a correct config

It seems somewhat cumbersome if I have to change yml config files on all nodes everytime I want to add a new node.

This is not the case at all. You can use the same config file for all nodes.


This all can be achieved while using Terraform or something similar.
What are you using for your script?

I didn’t mean that Crate.IO goes corrupt (because it never does! :smiley: :smiley: :smiley: )… but the EC2 instances have been known to become unstable in rare occasions :slight_smile: (I think this is even mentioned somewhere in the AWS docs)

Jokes aside… and thanks for addressing all my questions :slight_smile:, it seems like I’ve overestimated the importance of the cluster.intial_master_nodes property.
I guess, I can relatively easy determine if the EC2 instance powering up atm by the Autoscaling group is the very first one, and if it is, I could to set to it a specific node name, or even its own IP address.
The rest of nodes simply wouldn’t have that cluster.intial_master_nodes property set at all, and we should be good, right?

One more question please, related to the data directory: If I decide, in future, to have more than three nodes in my cluster, that means that I have to change the properties gateway.expected_nodes and gateway.recover_after_nodes. But, can I do that just like that and then restart my nodes? Will they still be able to re-join the cluster, without deleting the data directory of course? I’m assuming yes, but just to double-check :slight_smile:
I’m referring of course to these comments:

Thanks

hah :slight_smile: My scenario got me into a race condition:

Server 1 starting up, but didn’t run the setup script yet
Server 2 starting up
Server1 running setup script, but there are already two servers… so the count is 2 and the cluster.intial_master_nodes is not correctly calculated…

In the end I had to have some more scripting to get this thing to startup from zero but without manual intervention, but it is working now.

One thought, and please forgive me if this is completely wrong, but I still don’t fully understand why I had to go through this hassle… Why is the cluster.intial_master_nodes mandatory, at all? In my case, it seems it would be easier if it would default to, well, any discoverable node, and if you want some nodes not to be master eligible, then you’d explicitly define those. Does that make any sense?

Pretty much all those settings mentioned in this thread are only relevant for a full cluster (re)start and have little to do with scaling out your cluster i.e. it is not relevant for adding new nodes. However if you have to restart your cluster (e.g. major update) it is good do update the configuration on all your nodes, otherwise you might see high loads, when only a part of the cluster is up again.

If I decide, in future, to have more than three nodes in my cluster, that means that I have to change the properties gateway.expected_nodes and gateway.recover_after_nodes . But, can I do that just like that and then restart my nodes? Will they still be able to re-join the cluster, without deleting the data directory of course?

you don’t have to, but you should. Yes, you can restart your cluster. Yes, this is possible without deleting the cluster. It only has to be assured, that the new nodes have a config, that prevents them from starting a new cluster, i.e. it is good to have the initial master nodes to be set to one of the existing cluster.


Just to make it clear once again. You can start and rund a cluster with a single crateDB config (crate.yml) that is shared between all nodes. Just for the intial cluster to be formed, it is necessary to set up an inital master node and expected nodes in the cluster. It however is good practice to update your config, when scaling out before doing a full cluster restart.

1 Like

Why is the cluster.intial_master_nodes mandatory, at all?

As described here https://crate.io/docs/crate/reference/en/latest/config/cluster.html#discovery, this setting is only needed for the very first election of a master node when forming a new cluster.
It is not needed anymore if a cluster was once started, e.g. when some nodes or the full cluster is restarted.
I have to admit that Elasticsearch does a better job in documenting this setting, see https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-bootstrap-cluster.html for further explanations.

Yes, only after I’ve asked that question I’ve dug deeper into ES docs.

Maybe a good way could be to add a reference to ES features in Crate.IO docs? At least for those config settings that are re-used like that.