How flexible is CrateDB when it comes to vertical scaling ? E.g. what if I increase the number of CPU cores and RAM per node , after having created a table and ingested data to it ?
This can just be done and typically wouldn’t need immediate action (e.g. double memory / cpu cores).
Can I simply alter the table settings to have more shards after I have increased the number of CPU cores?
Increasing shards can be done on any table. However twice as many CPU cores typically doesn’t mean, that you need twice as many shards immediately.
The number of shards can be changed after table creation, providing the value is a multiple of number_of_routing_shards (set at table-creation time). Altering the number of shards will put the table into a read-only state until the operation has completed.
However typically with growing use cases you would typically use partitioned tables. For partitioned tables you can also change the number of shards for all partitions, or just for single partitions and/or for all future partitions.
Also: are there some practical limitations for the amount of RAM and number of CPU cores per node ?
As a Java application CrateDB is using compressed oops which soft limits the amount of heap to be used to ~30.5GB. With the current recommendation of 25% of total system memory as heap, this would be 128GB per crate process / vm / container. (also see Memory configuration — CrateDB: How-Tos). That being said, you can run bigger single nodes, however through to CrateDBs distributed nature, often more nodes are faster overall as bigger ones.
Typical configurations include nodes with 16 vCPUs and 64 GiB Memory (e.g. AWS m5.4xlarge / Azure D16s). Depending on use cases up to 300 - 400 nodes.
But we also support production use cases with 3 nodes of 4 vCPUs and 16GiB of Memory.