Mmap error followed by node crash in crate cluster

Hi all,
I am seeing below errors randomly and crate service crashes with these. Any help would be appreciated.

Caused by: java.io.IOException: Map failed: MMapIndexInput(path=“/data/crate/nodes/0/indices/96h08KOgQ26VJ1vNaI_fXA/1/index/_3_1_Lucene80_0.dvd”) [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 89 bytes. Please review ‘ulimit -v’, ‘ulimit -m’ (both should return ‘unlimited’), and ‘sysctl vm.max_map_count’. More information: The Generics Policeman Blog: Use Lucene’s MMapDirectory on 64bit platforms, please!]

Jul 20 16:06:31 MyMachine.example.com crate: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f90fa381000, 65536, 1) failed; error=‘Not enough space’ (errno=12)
Jul 20 16:06:31 MyMachine.example.com crate: [thread 10825 also had an error]
Jul 20 16:06:33 MyMachine.example.com systemd: crate.service: main process exited, code=exited, status=1/FAILURE
Jul 20 16:06:33 MyMachine.example.com systemd: Unit crate.service entered failed state.
Jul 20 16:06:33 MyMachine.example.com systemd: crate.service failed.

A fatal error has been detected by the Java Runtime Environment:
Native memory allocation (mprotect) failed to protect 16384 bytes for memory to guard stack pages
Possible reasons:
The system is out of physical RAM or swap space
The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
Possible solutions:
Reduce memory load on the system
Increase physical memory or swap space
Check if swap backing store is full
Decrease Java heap size (-Xmx/-Xms)
Decrease number of Java threads
Decrease Java thread stack sizes (-Xss)
Set larger code cache with -XX:ReservedCodeCacheSize=
JVM is running with Zero Based Compressed Oops mode in which the Java heap is
placed in the first 32GB address space. The Java Heap base address is the
maximum limit for the native heap growth. Please use -XX:HeapBaseMinAddress
to set the Java Heap base and to place the Java Heap above 32GB virtual address.
This output file may be truncated or incomplete.

Internal Error (stackOverflow.cpp:106), pid=121416, tid=124520
Error: memory to guard stack pages

There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
Possible reasons:
The system is out of physical RAM or swap space
The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
Possible solutions:
Reduce memory load on the system
Increase physical memory or swap space
Check if swap backing store is full
Decrease Java heap size (-Xmx/-Xms)
Decrease number of Java threads
Decrease Java thread stack sizes (-Xss)
Set larger code cache with -XX:ReservedCodeCacheSize=
JVM is running with Zero Based Compressed Oops mode in which the Java heap is
placed in the first 32GB address space. The Java Heap base address is the
maximum limit for the native heap growth. Please use -XX:HeapBaseMinAddress
to set the Java Heap base and to place the Java Heap above 32GB virtual address.
This output file may be truncated or incomplete.

Out of Memory Error (os_linux.cpp:2760), pid=16228, tid=16246

JRE version: OpenJDK Runtime Environment Temurin-17.0.3+7 (17.0.3+7) (build 17.0.3+7)
Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (17.0.3+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
No core dump will be written. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

Please note I already have ulimit -v and -m returning unlimited and vm.max_map_count set to 262000+.

bootstrap.memory_lock: true in crate.yml

node.store.allow_mmap is commented i.e. it is running with default value of true.

Heap is configured 30G and another 30G is free for OS to use. No other application is running on my Cent OS. There is enough disk space for swapping.

This issue is seen on 4.6.7 as well as 4.8.1 version of crate. I am loading 500GB to 2TBs of data per partition in 3 node cluster.

Am I missing some setting at crate or at OS level?

Thank you.

Heap is configured 30G and another 30G is free for OS to use. No other application is running on my Cent OS. There is enough disk space for swapping.

bootstrap.memory_lock: true prevents any kind of swapping. And if you don’t have taken any other measures to do so, it is correct to have it turned on.

The recommendation for allocated heap was changed to 25% of system memory by default with CrateDB 4.2 and up. CrateDB relies on mmap a lot and having “only” 30GiB to be shared with the OS might be to little for your use case

I am loading 500GB to 2TBs of data per partition in 3 node cluster.

How many shards does a partition have?
How many shards does the cluster have in total on how many nodes?

Thanks for mentioning it. Do you mean I should set 30GB Heap and keep 90GB free ? Or keep 16GB as Heap and 48GB free ?

Cluster 1:
50 shards per partition. 10 nodes. 1500 shards in total cluster. Each shard size 20GB approx.

Cluster 2:
30 shards per partition. 3 nodes. 7 partitions total so 210 shards in cluster. Each shard size around 45GB approx.

This was done as one way to improve the ingestion speed. Do you think it will impact the stability of the cluster?

This was done as one way to improve the ingestion speed. Do you think it will impact the stability of the cluster?

It should be enabled if no other measures are taken to ensure CrateDB does not swap. (it should not)

Thanks for mentioning it. Do you mean I should set 30GB Heap and keep 90GB free ? Or keep 16GB as Heap and 48GB free ?

If you have a total of 64 GiB system memory available the default recommendation is to use 16 GiB for CrateDB heap and 48 GiB for the system. CrateDB heavily relies on mmap which uses the system memory.

You can also check the number of open files / fds in use directly in CrateDB

SELECT
process['max_open_file_descriptors'],
process['open_file_descriptors']
FROM sys.nodes
limit 100;

Do you mean I should not have any swapping enabled on the machine? I can configure machines to have no swap space at all in that case.