Shards replication error

Hello guys,

I set up a 4.0.10 three-node cluster, all nodes inside docker containers, and I was playing with different replication and sharding options in order to perform performance tests over a small ~160MB table.

The point is sometimes after creating tables with different replication parameters the error below appears and I don’t understand where it comes from nor how to solve it

crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | org.elasticsearch.transport.RemoteTransportException: [openstack-vm-node][185.36.208.159:4301][indices:admin/seq_no/global_checkpoint_sync[p]]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | Caused by: org.elasticsearch.action.UnavailableShardsException: [mtairquality.4shardsfullyrepletairqualityobservedold][1] Not enough active copies to meet shard count of [ALL] (have 1, needed 3). Timeout: [1m], request: [GlobalCheckpointSyncAction.Request{shardId=[mtairquality.4shardsfullyrepletairqualityobservedold][1], timeout=1m, index='mtairquality.4shardsfullyrepletairqualityobservedold', waitForActiveShards=ALL}]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:99) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:347) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:287) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:948) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:945) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:271) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:238) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2226) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:957) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:308) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:283) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:275) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:674) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:694) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.lang.Thread.run(Thread.java:835) [?:?]

The shards tab in CrateDB GUI remains in the way below and it seems there is no progress on recovery.

Can you help me with this error? I don’t know what I am doing to throw it and I don’t know how to solve it without removing the table or changing the number_of_replicas parameter.

Thank you in advance

Hey, sorry to hear the replicas are “red” … Did you check whether sys.allocations provides somemore information? https://crate.io/docs/crate/guide/en/latest/best-practices/systables.html?highlight=recovery#id5 … Depending what’s wrong you could fire the ALTER CLUSTER RETRY REROUTE FAILED.

Ok, thank you @Walter_Behmann.

I will try to understand better sys-tables and health check the cluster status with them.