Difference between master-eligible nodes and data nodes: can you clarify?

According to Going into production — CrateDB: How-Tos “You can configure a node that only handles client requests and query execution (i.e., is not master-eligible) …”

Let’s assume following scenarios:

  • Cluster A: An on-premise cluster with 3 nodes.
  • Cluster B: An on-premise cluster with 5 nodes.

In terms of best practices, how many master-eligible nodes are required for each cluster?

Can a master-eligible node be also a data node?

What does it mean for a node to be to able to handle client requests but not handle query execution loads or cluster management (i.e., is not master-eligible), what is the purpose of having a node configured as the following?:

node.master: false
node.data: false

Hi @Emre_Sevinc,

for smaller clusters (3, 5 or similar range of nodes) we don’t recommend to configure special node types and let CrateDB handle it automatically for you.

Special-purpose nodes are useful for setups with multiple dozens or 100s of CrateDB nodes.

Can a master-eligible node be also a data node ?

Yes, that’s the default for nodes.

In terms of best practices, how many master-eligible nodes are required for each cluster?

For smaller clusters we recommend to have all nodes be master-eligible. For big clusters it can be useful to have dedicated master nodes which sole responsibility is to manage cluster state.

What does it mean for a node to be to able to handle client requests but not handle query execution loads or cluster management (i.e., is not master-eligible), what is the purpose of having a node configured as the following?:

node.master: false
node.data: false

These would be request handling nodes which are responsible for (as the name indicates) handling client requests and merging the results from query execution on the distributed data nodes. The following blog post goes into more details how this works internally: https://zignar.net/2021/05/20/distributed-select-statement-execution-in-cratedb/

Some of the advantages in bigger setups to have dedicated master nodes, data nodes and client request handling nodes can be to scale these nodes independently (think scale down request handling nodes on weekends when load is lower) or to use special-purpose server configuration for specific tasks (e.g. bigger disks for data nodes, I/O optimized nodes for request handling or similar).

1 Like