What can be the root cause of a recent crash we encountered in one of our CrateDB nodes?

Hello,

We are running a 3-node CrateDB Version: 4.6.6 cluster on our Debian GNU/Linux 10.11 servers.

Each node has 64 GB RAM and 16 CPU cores.

Each node (Debian server) has a separate disk, mounted on /data, dedicated to CrateDB: a 450 GB disk.

Each node’s /data disk has about 36 GB of data stored.

Our largest time series data has about 200 millions of rows, constantly growing.

We also have some .NET applications that continuously insert time series data to our CrateDB cluster: those .NET applications run continuously and insert some rows almost every second.

We’re trying to find out some weird crashing issue that we experienced on 2022-01-25 (this Tuesday) morning.

Below is the summary of events:

Last Friday evening, we installed the latest version of Apache Superset on a new server, and from there we were able to connect to CrateDB and run some simple to queries to check that Apache Superset was able to visualize
data from our CrateDB cluster. Everything was fine.

A few days later, this Tuesday morning, one of our colleagues used Apache Superset and we suspect that via Apache Superset some SQL queries that tried to return a lot of data from CrateDB were run.

Around that time we realized that CrateDB showed a Critical error:

  • One of the shards (shard number 3 on node 02) for table data.meterreading was missing.

This was reported as missing on the web-based Admin UI in “Tables and shard overview”.

However, the following query reported all shards as started:

select * from sys.allocations where table_name = 'meterreading';

The missing shard was marked as the primary node.

The secondary shard did not automatically become the next primary, probably due to a state mismatch as described above.

Then one of our colleagues rebooted the problematic node (node 02). On reboot, CrateDB did a normal fail-over, relocating and duplicating shards correctly over the remaining 2 nodes. Once the node became online again, it joined the CrateDB cluster correctly.

The database was in the erroneous state approximately between 07:00 UTC and 09:30 UTC.

We then realized that while the database was in this problematic state, a lot of INSERTS by our .NET applications failed without triggering an Npgsql exception (if such exceptions were triggered, those .NET apps would queue the data on a local file system so that they could be inserted later). There is some data, probably the data that is stored on different shards).

We suspect one of the “problematic” queries that returned a lot of data was something like the following:

SELECT field1 AS f1, field2 AS f2, field3 as f3, COUNT(*) AS count
FROM data.meterreading
GROUP BY f1, f2, f3
ORDER BY count DESC
LIMIT 10000;

Because when we ran the following query we could see that executing the above query multiple times, resulting in different errors. But we believe these errors should have been handled by CrateDB and not crash one of the shards.

select * from sys.operations_log where error is not null order by started desc limit 200;

Checking the /var/log/crate/crate-energytix-2022-01-25.log file on Node 02 of our CrateDB cluster, around the time we encountered the issues above, we see messages such as:

[2022-01-25T07:19:57,679][WARN ][i.c.e.d.u.TransportShardUpsertAction] [crate-dn-002] [[data.meterreading][3]] failed to perform internal:crate:sql/data/write on replica [data.meterreading\
][3], node[FM5C1UbYRJSyPnRQytu25Q], [R], s[STARTED], a[id=jkrXuuNhT9qfPMM71hTMdw]
org.elasticsearch.transport.RemoteTransportException: [crate-dn-001][192.168.239.30:4300][internal:crate:sql/data/write[r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380244/15.2gb], which is larger than the limit \
of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=2788/2.7kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_log=724734104/691.1mb, accounting=1470742/1.4m\
b]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326) ~[crate-server.jar:?]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126) ~[crate-server.jar:?]
...
.
.

[2022-01-25T07:19:57,709][WARN ][o.e.c.a.s.ShardStateAction] [crate-dn-002] unexpected failure while sending request [internal:cluster/shard/failure] to [{crate-dn-001}{FM5C1UbYRJSyPnRQytu\
25Q}{ZLwF-KXEQ0OHYbCu3x6Oxg}{192.168.239.30}{192.168.239.30:4300}{http_address=192.168.239.30:4200}] for shard entry [shard id [[data.meterreading][3]], allocation id [jkrXuuNhT9qfPMM71hTM\
dw], primary term [4], message [failed to perform internal:crate:sql/data/write on replica [data.meterreading][3], node[FM5C1UbYRJSyPnRQytu25Q], [R], s[STARTED], a[id=jkrXuuNhT9qfPMM71hTMd\
w]], markAsStale [true], failure [org.elasticsearch.transport.RemoteTransportException: [crate-dn-001][192.168.239.30:4300][internal:crate:sql/data/write[r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380244/15.2gb], which is larger than the limit \
of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=2788/2.7kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_log=724734104/691.1mb, accounting=1470742/1.4m\
b]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126)
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:166)
...
.
.
org.elasticsearch.transport.RemoteTransportException: [crate-dn-001][192.168.239.30:4300][internal:cluster/shard/failure]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380890/15.2gb], which is larger than the limit \
of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=3434/3.3kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_log=724734104/691.1mb, accounting=1470742/1.4m\
b]
...
.
.
[2022-01-25T07:19:57,817][WARN ][o.e.i.e.Engine           ] [crate-dn-002]  [data.meterreading][3]failed engine [primary shard [[data.meterreading][3], node[ZonWXPbIToyAB8tePVErgw], [P], s\
[STARTED], a[id=SK9ewTQOSN2rXnf75nmb_Q]] was demoted while failing replica shard]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380890/15.2gb], which is larger than the limit of [1632087\
5724/15.1gb], usages [request=0/0b, in_flight_requests=3434/3.3kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_log=724734104/691.1mb, accounting=1470742/1.4mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326) ~[crate-server.jar:?]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126) ~[crate-server.jar:?]
...
.
.
[2022-01-25T07:19:57,831][WARN ][o.e.i.c.IndicesClusterStateService] [crate-dn-002] [data.meterreading][3] marking and sending shard failed due to [shard failure, reason [primary shard [[d\
ata.meterreading][3], node[ZonWXPbIToyAB8tePVErgw], [P], s[STARTED], a[id=SK9ewTQOSN2rXnf75nmb_Q]] was demoted while failing replica shard]]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380890/15.2gb], which is larger than the limit of [1632087\
5724/15.1gb], usages [request=0/0b, in_flight_requests=3434/3.3kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_log=724734104/691.1mb, accounting=1470742/1.4mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326) ~[crate-server.jar:?]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126) ~[crate-server.jar:?]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:166) ~[crate-server.jar:?]
...
.
.
[2022-01-25T07:19:57,834][WARN ][o.e.c.a.s.ShardStateAction] [crate-dn-002] unexpected failure while sending request [internal:cluster/shard/failure] to [{crate-dn-001}{FM5C1UbYRJSyPnRQytu\
25Q}{ZLwF-KXEQ0OHYbCu3x6Oxg}{192.168.239.30}{192.168.239.30:4300}{http_address=192.168.239.30:4200}] for shard entry [shard id [[data.meterreading][3]], allocation id [SK9ewTQOSN2rXnf75nmb\
_Q], primary term [0], message [shard failure, reason [primary shard [[data.meterreading][3], node[ZonWXPbIToyAB8tePVErgw], [P], s[STARTED], a[id=SK9ewTQOSN2rXnf75nmb_Q]] was demoted while\
 failing replica shard]], markAsStale [true], failure [org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16323380\
890/15.2gb], which is larger than the limit of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=3434/3.3kb, query=10009706496/9.3gb, jobs_log=455914552/434.7mb, operations_lo\
g=724734104/691.1mb, accounting=1470742/1.4mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126)
...
.
.
[2022-01-25T09:34:04,191][WARN ][o.e.c.a.s.ShardStateAction] [crate-dn-002] unexpected failure while sending request [internal:cluster/shard/failure] to [{crate-dn-001}{FM5C1UbYRJSyPnRQytu\
25Q}{ZLwF-KXEQ0OHYbCu3x6Oxg}{192.168.239.30}{192.168.239.30:4300}{http_address=192.168.239.30:4200}] for shard entry [shard id [[log.requests][3]], allocation id [jHbvLAgTR46yXjjNs3XFWg], \
primary term [4], message [failed to perform internal:crate:sql/data/write on replica [log.requests][3], node[FM5C1UbYRJSyPnRQytu25Q], [R], s[STARTED], a[id=jHbvLAgTR46yXjjNs3XFWg]], markA\
sStale [true], failure [org.elasticsearch.transport.RemoteTransportException: [crate-dn-001][192.168.239.30:4300][internal:crate:sql/data/write[r]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16408822199/15.2gb], which is larger than the limit \
of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=2223/2.1kb, query=9386852352/8.7gb, jobs_log=521354264/497.2mb, operations_log=727150280/693.4mb, accounting=1497894/1.4mb\
]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126)
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:166)
...
.
.
[2022-01-25T09:34:04,205][WARN ][o.e.c.a.s.ShardStateAction] [crate-dn-002] unexpected failure while sending request [internal:cluster/shard/failure] to [{crate-dn-001}{FM5C1UbYRJSyPnRQytu\
25Q}{ZLwF-KXEQ0OHYbCu3x6Oxg}{192.168.239.30}{192.168.239.30:4300}{http_address=192.168.239.30:4200}] for shard entry [shard id [[log.requests][3]], allocation id [h1p9rQFVTeGLN2ybnBMq_w], \
primary term [0], message [shard failure, reason [primary shard [[log.requests][3], node[ZonWXPbIToyAB8tePVErgw], [P], s[STARTED], a[id=h1p9rQFVTeGLN2ybnBMq_w]] was demoted while failing r\
eplica shard]], markAsStale [true], failure [org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [16408823399/15.2gb\
], which is larger than the limit of [16320875724/15.1gb], usages [request=0/0b, in_flight_requests=3423/3.3kb, query=9386852352/8.7gb, jobs_log=521354264/497.2mb, operations_log=727150280\
/693.4mb, accounting=1497894/1.4mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:326)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:126)
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:166)
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:115)
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:104)
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:704)
...
.
.

Can you help us identify what might be the root cause of this issue we experienced?

And what precautions should we take (e.g. configuration changes?) not to encounter something similar in the future?

1 Like