CrateDB cluster node, how to use all the nodes

Hi everyone,

I have 3 node crateDB cluster like this:

CONTAINER ID   IMAGE         COMMAND                  CREATED       STATUS       PORTS                                                                                            NAMES
3e73dabc643c   crate:5.0.0   "/docker-entrypoint.…"   5 weeks ago   Up 5 weeks   4300/tcp, 0.0.0.0:4202->4200/tcp, :::4202->4200/tcp, 0.0.0.0:5433->5432/tcp, :::5433->5432/tcp   work-cratedb02-1
d35b21cd6e8b   crate:5.0.0   "/docker-entrypoint.…"   5 weeks ago   Up 7 days    4300/tcp, 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp, 0.0.0.0:4201->4200/tcp, :::4201->4200/tcp   work-cratedb01-1
205e6ba07384   crate:5.0.0   "/docker-entrypoint.…"   5 weeks ago   Up 2 weeks   4300/tcp, 0.0.0.0:4203->4200/tcp, :::4203->4200/tcp, 0.0.0.0:5434->5432/tcp, :::5434->5432/tcp   work-cratedb03-1

And I am using HTTP Endpoint for running sql using something like this:

curl -sS -H 'Content-Type: application/json' -X POST 'InstanceIP:4201/_sql' -d '{"stmt":"select * from demo_table limit 2"}'

I want to know, is there a way where we do not have to specify port of a single node in the url like 4201 because as mentioned above there are three nodes and I want to utilize all three for read and write instead of just using one, so that it will load balance across all the three nodes thus enhancing read and write operations

Any help would be much appreciated.

Hi,

I want to utilize all three for read and write instead of just using one, so that it will load balance across all the three nodes thus enhancing read and write operations

There are 2 sides to this, one is which node your client application (curl in this case) will send the request to, then there is the execution of the query.
Regardless of which node receives the query, CrateDB will try to use all the resources of all nodes to complete the request as soon as possible, you can read about sharding and replicas, that is how CrateDB distributes data across the nodes.

Which node your client application sends the request to is still important so that if there are lots of clients the requests can be balanced and in case a node is not available another node can still be reached.
Ideally you would deploy what is called a load balancer, this is the same that you could use for any http web server. If you are looking for a simpler solution you could have all nodes on the same port but on different IP addresses and use DNS round-robin

1 Like