How to handle "There are still active requests on this node, delaying graceful shutdown"?

proddata · December 17, 2021, 2:56pm

Before the DECOMMISSION command, it was about 27 GB per node. Now I see that first node’s data disk almost empty, it became only 73 MB , and the others increased , to about 40 GB .

Have you set

cluster.graceful_stop.min_availability=full

??? It looks like that shards have been relocated to the two remaining nodes. This would be the case if the number of replicas would be set to '1' (not 0-1) or if the min availbilty is set to full.

It seems like there’s no psql connection to the first node. But I continue getting:

select name, connections['psql']['open'], connections['http']['open'] from sys.nodes ORDER BY name limit 100;

There might also be open http connections (like used by crash). However I would think this is related to finishing up the relocation of shards and open transport connections.

I also see the following in the logs:

org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected

those should be expected, as the sql service is blocking any new incoming connections from your LB.

It’s been about 1.5 hours since I issued the decommission command. It still says “There are still active requests on this node, delaying graceful shutdown”.

The default timeout is 2h for a graceful shutdown to fail. By default the node would then go back into operation

Is there a way to forcibly stop those “active requests”? (Or see what they actually are?)

If you check

SELECT * FROM sys.shards
where routing_state <> 'STARTED'

and 0 rows are returned you can also terminated the crate process.

Topic		Replies	Views
Node timeout in clusters CrateDB	4	260	July 28, 2023
Exception when detaching node via crate-node CrateDB	0	646	June 15, 2020
Shards replication error CrateDB	2	1458	February 12, 2020
CrateDB database logs showing "shard is now inactive", and threads are getting blocked CrateDB	16	488	October 25, 2023
SQL Freeze after sync done SQL	3	656	October 20, 2021

How to handle "There are still active requests on this node, delaying graceful shutdown"?

Related Topics