Guide to decrease the number of shards in your cluster

Potentially you created your tables with too many shards and want to rectify this situation. Some indications for too many shards can be:

  • you have many small shards with typical shard sizes being below 5GB, or
  • you have over-allocated the number of shards in your tables and are noticing performance degradation due to this, or
  • your cluster grew over time and you will hit the cluster.max_shards_per_node limit of 1000 shards per node soon

You likely want to take action. But what are your options?

Potentially your sharding strategy is fine

First, give it a thought if your sharding strategy is sound. Do you have correctly sized shards? Is your performance good? Do your business use cases require you to retain a specific amount of data which in turn requires a specific amount of shards?

If yes, you probably did something very right :raised_hands:. If you are going to hit the shard limit soon you should likely think about scaling your cluster by adding additional nodes to it.

If not though check out the options below.

Change number of shards for future partitions

If you have a partitioned table and no immediate action is required your easiest option is to change the number of shards going forward and leave old partitions at the old levels. Over time your retention policy will phase out your shard over-allocated partitions.

You can change the number of partitions going forward with the following command:

ALTER TABLE <table_name> SET (number_of_shards = <reduced_number_of_shards>);

Change sharding for an existing table

Given you want to retroactively change the number of shards for an existing table or partitions you need to follow a number of steps to execute this course of action. You have two possibilities to do this both with certain advantages and drawbacks:

Re-Create table

This is likely the easier option to execute.

First, create a second table with the desired number of shards. Afterwards, use an INSERT INTO SELECT to copy the data from the old table to the new table:

INSERT INTO <new_table_name> (SELECT * FROM <old_table_name>);

Finally, you can drop the old table and optionally rename the new table to the old table name.

Change number of shards in place

It is also possible to change the number of shards in place but certain restrictions apply:

  • You need a node that is capable to hold the data of the whole table
  • The new number of shards must be a factor of the current number of shards (e.g. if you currently have 15 shards you can reduce to either 5, 3, or 1 shard)
  • During this process, you will not have any replicas of your shards which could potentially lead to data loss if you should lose this node during the operation
  • If you have replicas configured your data health check will temporarily switch to orange

If you are ok with these you can proceed by following these steps:

  1. First, we will allocate all shards to a specific node:

    ALTER TABLE <table_name> SET ("routing.allocation.require._name" = '<selected_node_name>');
    
  2. Once this process is finished you need to set the table into read-only mode:

    ALTER TABLE <table_name> SET ("blocks.write" = true);
    
  3. Now reduce the number of shards to the desired number (only factors of the current number of shards are valid):

    ALTER TABLE <table_name> SET (number_of_shards = 5);
    
  4. Finally, remove the read-only mode and the allocation filter:

    ALTER TABLE <table_name> RESET ("routing.allocation.require._name", "blocks.write");
    
3 Likes

Hi @jayeff , Is it possible to reduce the shards for certain partitions only? e.g. if the partition is on workday and I would want only few days to have less shards. And this is required after the data is ingested for those days.

@vinayak.shukre

Yes you have to use the ALTER TABLE with PARTITION

e.g.

ALTER TABLE <table_name> PARTITION (partition_column = value [ , ... ]) SET (number_of_shards = 5);

It is easier to do that beforehand though.

also see

Thanks. Will try this. I thought this will only work before creating the partitions. I have tried it that way.