Index size limit

Hi,

I have a Table with a size of about 40gbs and this table will only grow. So about this amount in a year.
So in 2 years we would have about 120gbs of data.

Can Crate handle this kind of Data and will I still be able to query over this amount of data?
Are there some settings I can change to improve performance?
More Sharding and maybe more Nodes could help?

Thanks for every answer :slight_smile:

Yes CrateDB can handle such load. Adding nodes will help as the query load will be spread across them if sharding is set up properly. Shard size should stay in certain limits, https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster is a good read related to that and as CrateDB uses the sharding logic of Elasticsearch it also applies to CrateDB. The bigger a shard the more bytes need to be transferred on reallocation/recovery. Also the maximum number of docs in a shard is 2^31, which is a lucene hard limit.

For time-based datasets, using a partitioned table can help a lot with read performances as queries can be limited to hit only a low number of partitions (e.g. limit to latest records only).
See https://crate.io/docs/crate/reference/en/latest/general/ddl/partitioned-tables.html.