Is it possible to get rid of empty partitions?

Hello,

I created a partitioned table as the following:

CREATE TABLE IF NOT EXISTS log.partitioned_requests (
  datetime TIMESTAMP WITH TIME ZONE,
  ts_month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', datetime),
  user_id VARCHAR(37),
  trace_id VARCHAR(37),
  user_name TEXT,
  user_email TEXT,
  protocol TEXT,
  method TEXT,
  scheme TEXT,
  host TEXT,
  path TEXT,
  query TEXT,
  content_type TEXT,
  request_duration BIGINT,
  response_duration BIGINT,
  response_code INTEGER,
  user_agent TEXT,
  remote_ip IP,
  referer TEXT
) PARTITIONED BY (ts_month, user_id) ;

And then inserted a few million rows into it, creating 85 partitions and 1020 shards.

Then I deleted rows from that table:

DELETE FROM log.partitioned_requests WHERE datetime < '2021-12-01';
DELETE FROM log.partitioned_requests WHERE datetime < '2022-01-01';
DELETE FROM log.partitioned_requests WHERE datetime < '2022-02-01';
DELETE FROM log.partitioned_requests WHERE datetime < '2022-03-01';

Everything went fine and all of the rows were deleted. But when I looked in the web-based administrative User Interface, I saw that partitions were still there.

I observed that size (sum of primary shards) of log.partitioned_requests was about 1000 MiB in the beginning, but after the DELETE queries it started to drop: 950 MiB , 926.6 MiB , 826.9 MiB , …, 177 MiB . It stabilized at 177 MiB in a few minutes .

I know 177 MiB isn’t that much, and not that small (is it metadata?) but if I want to get rid of those partitions, how can I do that?

Should I ‘close’ the partition as described in Partitioned tables — CrateDB: Reference?

Hi @Emre_Sevinc,

can you clarify which CrateDB version you are running? Also what column are you partitioning by (e.g. date trunced timestamp? something else?)?

If empty partitions are not dropped after a DELETE that would be likely a bug with CrateDB. I’ll test this later today to see if I can reproduce this behaviour.

In the meantime you should be able to delete your empty partitions with directly referencing the partition column:

DELETE FROM log.partitioned_requests 
  WHERE <partition_column> = <partition_value>;

Or delete multiiple partitions at once:

DELETE FROM log.partitioned_requests 
  WHERE <partition_column> IN (<partition_value1>, <partition_value2>,...);
1 Like

Hello,

I’m using version 4.7.0 on a 3-node system (each node running Debian GNU/Linux).

I partitioned the table based on (ts_month, user_id) column combination, and those column types are:

ts_month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', datetime),
user_id VARCHAR(37),

I will try deleting by using equality (=) operator. I thought CrateDB would infer that partition would become empty by looking at the time ranges I gave (using < operator), but maybe I misunderstood that part. I will test and report back

OK, apparently I need to give the exact values for partition columns. When I did that I was able to get rid of partitions.

Thanks for the clarification.

1 Like

Yes, that would be the expected behaviour. Maybe CrateDB optimiser can not reliable infer if all records of a partition match with two partition colums

From the docs:

Deleting data from a partitioned table is cheap if full partitions are dropped. Full partitions are dropped with DELETE statements where the optimizer can infer from the WHERE clause and partition columns that all records of a partition match without having to evaluate against the records.
Source: Partitioned tables — CrateDB: Reference

2 Likes