Data too large?! I have 24Gb or RAM for 3 nodes and 12 cores

miguel.arregui · May 31, 2020, 8:57am

I need to run a query across 500GB data, do we need 500GB data on RAM?

miguel.arregui · May 31, 2020, 9:38am

CrateDB and memory management:

CrateDB runs on the JVM (for ninjas), you set its heap size with env var CRATE_HEAP_SIZE.
CrateDB uses G1 GC, large heaps may eventually result in increased latency (fine tuning G1 params).
When you issue a query, all the intermediate result structures, as well as final ones, must reside in heap space, so the heap needs to be as big as these result sets, which are dependent on your specific use case.
Our usual recommendation is to start off with 25% of available memory in the host. Notice that all Lucene level memory management is done off heap, via memory mapped files. Lucene is the indexing/retrieval/persistence engine we use at the bottom of the application stack.
We also recommend that you do not exceed 30.5GB of heap, so that you can benefit from a JVM level optimisation (for ninjas).

Memory config guide: here.

In addition, at the node level (crate.yml confifg file), you can configure bootstrap.memory_lock, which if true will result in CrateDB executing system command mlockall on startup a.k.a. bootstrap.

Finally, CrateDB uses a memory circuit breaker at the cluster level. Any query resulting in memory usage above a certain threshold, OR if the cluster is at memory utilisation limit, will be terminated. There are six kinds of circuit breaker:

Parting thoughts. If your node has 8Gb of RAM, using defaults (60% query breaker 1), means 60% of 8GB => 4.8GB. Your query intermediate/final results (the live set) would need to fit in 4.8GB.

A count(distinct) query on an absolutely humongous dataset will tend to be shutdown by the query breaker, thus for such cases we recommend the use of hyperloglog-distinct.

Topic		Replies	Views
Error when reading from local crate installation CrateDB	3	511	February 1, 2023
High Heap usage after a while CrateDB	3	626	December 1, 2021
Index size limit CrateDB	1	1026	October 24, 2019
For cluster building, one node has multiple instances CrateDB fundamentals , architecture	1	133	November 22, 2023
CrateDB performance CrateDB	0	666	October 21, 2020

Data too large?! I have 24Gb or RAM for 3 nodes and 12 cores

Related Topics