How to handle "There are still active requests on this node, delaying graceful shutdown"?

Before the DECOMMISSION command, it was about 27 GB per node. Now I see that first node’s data disk almost empty, it became only 73 MB , and the others increased , to about 40 GB .

Have you set

cluster.graceful_stop.min_availability=full

??? It looks like that shards have been relocated to the two remaining nodes. This would be the case if the number of replicas would be set to '1' (not 0-1) or if the min availbilty is set to full.

It seems like there’s no psql connection to the first node. But I continue getting:

select name, connections['psql']['open'], connections['http']['open'] from sys.nodes ORDER BY name limit 100;

There might also be open http connections (like used by crash). However I would think this is related to finishing up the relocation of shards and open transport connections.

I also see the following in the logs:

org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected

those should be expected, as the sql service is blocking any new incoming connections from your LB.


It’s been about 1.5 hours since I issued the decommission command. It still says “There are still active requests on this node, delaying graceful shutdown”.

The default timeout is 2h for a graceful shutdown to fail. By default the node would then go back into operation

Is there a way to forcibly stop those “active requests”? (Or see what they actually are?)

If you check

SELECT * FROM sys.shards
where routing_state <> 'STARTED'

and 0 rows are returned you can also terminated the crate process.