SQL Freeze after sync done

Hello,

We have a problem with crateDB in production environment.

We have 3 nodes in the cluster on 3 different servers. All 3 nodes are eligible to become master.
Each server is running on linux (redhat 8.3) and has 60GB of memory. But the service is started with 30GB of heap size, with the following command:
ulimit -u 4096 && CRATE_HEAP_SIZE=‘30g’ && CRATE_JAVA_OPTS=’-Xms30g -Xmx30g’ /apps/crate-4.6.4/bin/crate

We are currently using version 4.6.4 of Crate.
The problem occurs since we switched to version 4.3.

When we start the cluster, SQL via HTTP is OK.
As soon as the synchronization reaches 100% the SQL is no longer accessible.

We have activated the debug level on the logs.
We see some activity, but the HTTP service is not responding.
No error appears.

The configuration is quite simple:

cluster.name: clustertername
node.name: "node1

path.data: /data/crate/clustername/

gateway.expected_nodes: 3
gateway.recover_after_nodes: 2

network.host: 10.135.x.y
network.bind_host: "dns_alias_for_10.135.x.y"

node.master: true
discovery.seed_hosts:
    - "10.135.x.x"
    - "10.135.x.y"
    - "10.135.x.z"
cluster.initial_master_nodes:
    - "10.135.x.x"
    - "10.135.x.y"
    - "10.135.x.z"

Would you have an idea to help us in solving this problem please?
In the meantime, so that the nodes are always up, we delete the /data/crate/clustername/ directory to force a synchronization…
This just keeps the HTTP SQL active.

Thanks for your help!

2 Likes

By SQL via HTTP do you mean the Admin UI or the http-interface (i.e. /_sql endpoint)?

Hello, I mean both

The administration interface is not responding, and the http interface is not responding either.

We tried to create a new cluster last night with copy to / copy from tables.
With one node, it’s ok.
When we start a second one and the synchronization is done… No more answer.

Maybe it is related to a table parameter?
We are back to the existing show create table, with the iso parameters.

Thank you!

1 Like

With one node, it’s ok.
When we start a second one and the synchronization is done… No more answer.

Do you start one node, write data to it and then add a 2nd one?
Do you still see shards being moved from one node to another?