Partioning question

Hello all, me again

OK at the moment I’ve got partitioning structure where a the month number is stored in the tables and this the partitioning column.

CREATE TABLE IF NOT EXISTS "doc"."v3_safe_1526595589" (
   "ts" TIMESTAMP WITH TIME ZONE,
...
...
   "dmax3" REAL,
   "id" VARCHAR(40) NOT NULL,
   "roundts" TIMESTAMP WITH TIME ZONE NOT NULL,
   "roundts_month" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', "roundts") NOT NULL,
   PRIMARY KEY ("roundts", "roundts_month", "id")
)
CLUSTERED BY ("id") INTO 3 SHARDS
PARTITIONED BY ("roundts_month")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'best_compression',
   column_policy = 'dynamic',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

The only problem is the data is projected to be at least 1.4TB a month in size and this will steadily increase.

I’m looking to break this down more and so was wondering if there is the option of bi-weekly, as I think week number could cause more problems ( 3 shards per table ) .
I know the interval for date_trunc is limited to “weekly”, but is there some sorcery I can use to do bi-weekly ?

Of course, if I’m being ridiculous, please suggest away…

Many thanks
David.

Hi David,
You could use something like

date_bin('2 weeks'::INTERVAL, roundts,'2023-01-01T00:00:00Z'::TIMESTAMP)

but if you are going to partitions over smaller periods of time please keep in mind the total number of shards in the cluster and the size of those against the recommendations in Sharding and Partitioning Guide for Time Series Data - Tutorials - CrateDB Community

I hope this helps.

1 Like

Many thanks Hernan,

This is what I was thinking - that I didn’t want to use week numbers as this would increase the shards quite rapidly. We’re looking to swap data over 60 days to cold storage, which would help, but I’ll work through the guide

Best regards

OK I’ll give this a test to see how it goes.

Just as a side issue, we’re using PHP and with the PDO driver it really does not like “:” in the SQL as its considered a parameter thatn will be passed in from the PHP code, so for the above I’ve used

date_bin( CAST ( '2 weeks' AS INTERVAL), roundts, 1672531200000)

Thanks again.