Hello,
To do a rolling upgrade from 4.6.5 to 4.6.6 I’m trying to gracefully shutdown our 3-node CrateDB cluster.
There are some .NET services constantly sending data via HAProxy to this 3-node cluster.
On Node 01, I gave the following commands:
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';
ALTER CLUSTER DECOMMISSION 'crate-dn-001' ;
But it can’t gracefully shutdown. When I check the logs, I see the following:
[2021-12-17T13:06:59,850][INFO ][o.e.c.s.ClusterSettings ] [crate-dn-001] updating [cluster.routing.allocation.enable] from [all] to [new_primaries]
[2021-12-17T13:09:06,119][INFO ][o.e.c.s.ClusterSettings ] [crate-dn-001] updating [crate.internal.decommission.] from [{}] to [{"FM5C1UbYRJSyPnRQytu25Q":"true"}]
[2021-12-17T13:09:27,464][WARN ][i.n.c.DefaultChannelPipeline] [crate-dn-001] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last ha\
ndler in the pipeline did not handle the exception.
org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:09:27,468][WARN ][i.n.c.DefaultChannelPipeline] [crate-dn-001] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last ha\
ndler in the pipeline did not handle the exception.
org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:09:40,055][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:10:41,213][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:11:42,327][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:12:12,985][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:13:14,136][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:13:30,643][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:13:33,920][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:14:15,316][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:14:28,948][WARN ][i.n.c.DefaultChannelPipeline] [crate-dn-001] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last ha\
ndler in the pipeline did not handle the exception.
org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:14:49,043][WARN ][o.e.i.IndexService ] [crate-dn-001] [data.meterreading][data.meterreading][3] failed to execute global checkpoint sync
org.elasticsearch.index.shard.ShardNotInPrimaryModeException: CurrentState[STARTED] shard is not in primary mode
at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2667) [crate-server.jar:?]
at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:90) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:296) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:246) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2638) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShard.runUnderPrimaryPermit(IndexShard.java:2718) [crate-server.jar:?]
at org.elasticsearch.index.IndexService.sync(IndexService.java:716) [crate-server.jar:?]
at org.elasticsearch.index.IndexService.maybeSyncGlobalCheckpoints(IndexService.java:696) [crate-server.jar:?]
at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:869) [crate-server.jar:?]
at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:869) [crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) [crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:831) [?:?]
[2021-12-17T13:15:00,062][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:15:00,125][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:15:07,097][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:15:41,225][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:16:48,934][WARN ][i.n.c.DefaultChannelPipeline] [crate-dn-001] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last ha\
ndler in the pipeline did not handle the exception.
org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:19:11,730][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:19:33,487][INFO ][i.c.a.HttpAuthUpstreamHandler] [crate-dn-001] password authentication failed for user=energytix from connection=/192.168.239.20
[2021-12-17T13:19:35,797][WARN ][i.n.c.DefaultChannelPipeline] [crate-dn-001] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last ha\
ndler in the pipeline did not handle the exception.
org.elasticsearch.transport.NodeDisconnectedException: [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:20:46,540][WARN ][o.e.i.IndexService ] [crate-dn-001] [data.quarter][data.quarter][3] failed to execute global checkpoint sync
org.elasticsearch.index.shard.ShardNotInPrimaryModeException: CurrentState[STARTED] shard is not in primary mode
at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$21(IndexShard.java:2667) [crate-server.jar:?]
at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:90) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:296) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:246) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2638) [crate-server.jar:?]
at org.elasticsearch.index.shard.IndexShard.runUnderPrimaryPermit(IndexShard.java:2718) [crate-server.jar:?]
at org.elasticsearch.index.IndexService.sync(IndexService.java:716) [crate-server.jar:?]
at org.elasticsearch.index.IndexService.maybeSyncGlobalCheckpoints(IndexService.java:696) [crate-server.jar:?]
at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:869) [crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) [crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:831) [?:?]
[2021-12-17T13:20:51,786][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:20:56,788][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:01,788][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:06,789][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:11,789][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:16,790][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:21,791][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:26,791][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:31,792][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:36,793][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:41,794][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:46,794][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:51,795][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:20:56,788][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:01,788][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:06,789][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:11,789][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:16,790][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:21,791][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:26,791][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:31,792][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:36,793][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:41,794][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:46,794][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:51,795][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:21:56,796][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:01,796][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:06,797][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:11,797][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:12,946][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:22:16,798][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:21,798][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:26,799][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:31,799][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:36,800][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:41,800][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:46,800][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:51,801][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:22:56,801][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:01,802][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:06,802][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:11,803][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:16,803][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:21,804][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:26,804][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:31,805][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:36,805][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:41,806][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:46,806][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:51,807][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:23:56,807][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:01,808][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:06,808][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:11,809][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:16,809][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:21,810][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:26,810][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:31,811][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:36,811][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:41,812][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:46,812][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:51,813][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:24:56,813][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:01,814][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:06,815][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:11,815][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:16,816][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:21,816][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:25:26,817][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:16,829][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:21,830][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:26,830][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:31,831][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:36,831][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:41,832][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:46,832][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:51,832][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:56,833][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:27:59,043][WARN ][i.c.p.p.Messages ] [crate-dn-001] [crate-dn-001][192.168.239.30:4300][sql] disconnected
[2021-12-17T13:28:01,833][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:06,834][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:11,834][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:16,835][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:21,835][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:26,836][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
[2021-12-17T13:28:31,836][INFO ][i.c.c.g.DecommissioningService] [crate-dn-001] There are still active requests on this node, delaying graceful shutdown
Any ideas how I can make this Node 01 of our CrateDB cluster finish graceful shutdown?
On top of that: I can’t connect to it via crash
utility (as superuser) anymore (it just hangs, stuck waiting).