Shards replication error

versatildefuy · January 24, 2020, 8:46am

Hello guys,

I set up a 4.0.10 three-node cluster, all nodes inside docker containers, and I was playing with different replication and sharding options in order to perform performance tests over a small ~160MB table.

The point is sometimes after creating tables with different replication parameters the error below appears and I don’t understand where it comes from nor how to solve it

crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | org.elasticsearch.transport.RemoteTransportException: [openstack-vm-node][185.36.208.159:4301][indices:admin/seq_no/global_checkpoint_sync[p]]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | Caused by: org.elasticsearch.action.UnavailableShardsException: [mtairquality.4shardsfullyrepletairqualityobservedold][1] Not enough active copies to meet shard count of [ALL] (have 1, needed 3). Timeout: [1m], request: [GlobalCheckpointSyncAction.Request{shardId=[mtairquality.4shardsfullyrepletairqualityobservedold][1], timeout=1m, index='mtairquality.4shardsfullyrepletairqualityobservedold', waitForActiveShards=ALL}]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:99) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:347) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.onResponse(TransportReplicationAction.java:287) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:948) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$1.onResponse(TransportReplicationAction.java:945) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:271) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:238) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:2226) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction.acquirePrimaryShardReference(TransportReplicationAction.java:957) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:308) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:283) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:275) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:674) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:694) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-app.jar:4.0.10]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
crate-node_crate-node.1.jlt346zv0k4h@hopu-pc    | 	at java.lang.Thread.run(Thread.java:835) [?:?]

The shards tab in CrateDB GUI remains in the way below and it seems there is no progress on recovery.

Can you help me with this error? I don’t know what I am doing to throw it and I don’t know how to solve it without removing the table or changing the number_of_replicas parameter.

Thank you in advance

Walter_Behmann · February 11, 2020, 1:27pm

Hey, sorry to hear the replicas are “red” … Did you check whether sys.allocations provides somemore information? https://crate.io/docs/crate/guide/en/latest/best-practices/systables.html?highlight=recovery#id5 … Depending what’s wrong you could fire the ALTER CLUSTER RETRY REROUTE FAILED.

versatildefuy · February 12, 2020, 9:29am

Ok, thank you @Walter_Behmann.

I will try to understand better sys-tables and health check the cluster status with them.

Topic		Replies	Views
Cannot allocate because all found copies of the shard are either stale or corrupt CrateDB	12	1084	July 14, 2022
What can we do when a silo is missing in a table? CrateDB	4	220	August 29, 2023
AccessDeniedException on _state CrateDB	3	707	October 21, 2021
Kubernetes scale statefulsets error: [node.max_local_storage_nodes] (was [1])? CrateDB	2	355	April 7, 2023
TranslogCorruptedException CrateDB	4	383	October 26, 2022

Shards replication error

Related Topics