How to handle "There are still active requests on this node, delaying graceful shutdown"?

Emre_Sevinc · December 23, 2021, 8:43am

To finalize this thread: I’ve stopped and started CrateDB on the first node, waited for it to be a part of cluster and then repeated the whole process, this time with a successful graceful shutdown. Then I upgraded CrateDB Debian packages and repeated the same thing for other nodes, finishing the cluster upgrade process in about 20 minutes.

I think the reason this process (graceful shutdown on the first node) was stuck waiting is:

After issuing the DECOMMISSION command via crash on the first node, I waited for about 1 minute and then Ctrl-C exited from crash command line utility.
And the reason I exited (Ctrl-C) is because I thought I could do it, and also I was expecting the decommission process to finish by that time. Apparently, it takes more than a few minutes!
In the successful case, I have seen that it takes about 7-8 minutes between the ALTER CLUSTER DECOMMISSION 'whatever-node-name-... ; and receiving ALTER OK, 1 row affected message.

During that process, min_availability was PRIMARIES (I never changed that):

 select settings['cluster']['graceful_stop']['min_availability'] from sys.cluster limit 100;
 settings['cluster']['graceful_stop']['min_availability']
----------------------------------------------------------
 PRIMARIES
(1 row)

And the largest time series table had '0-1' as the number of replicas:

 number_of_replicas = '0-1',

Topic		Replies	Views
Node timeout in clusters CrateDB	4	261	July 28, 2023
Exception when detaching node via crate-node CrateDB	0	646	June 15, 2020
Shards replication error CrateDB	2	1459	February 12, 2020
CrateDB database logs showing "shard is now inactive", and threads are getting blocked CrateDB	16	488	October 25, 2023
SQL Freeze after sync done SQL	3	656	October 20, 2021

How to handle "There are still active requests on this node, delaying graceful shutdown"?

Related Topics