Filtering Shards during query with clustering column filter

Francisco_Javier_Par · September 16, 2021, 4:52pm

Hello,

I could not find information in the documentation about whether the shards are filtered during a SELECT query if the filter of the query includes the CLUSTERING KEY.

For example, if a table like the following one

CREATE TABLE test.example (
   "c1" INTEGER,
   "c2" BIGINT,
   "c3" BIGINT,
   PRIMARY KEY ("c1", "c2")
)
CLUSTERED BY ("c1") INTO 6 SHARDS

is filtered with the following query

SELECT *
FROM test.example
WHERE c1 = 55

will the engine skip the shards that do not contain c1=55 or will it still search all the shards even if their distribution was based on this exact field?

Additionally, I am interesting in the equivalent question with the partition key instead. Will only the shards of one partition be searched in this case?

CREATE TABLE test.example (
   "c1" INTEGER,
   "c2" BIGINT,
   "c3" BIGINT,
   PRIMARY KEY ("c1", "c2")
)
PARTITIONED BY ("c1")

proddata · September 17, 2021, 2:01pm

For partitions it is definitely taken into account. You can check that by running your queries with EXPLAIN ANALYZE. This shows which shards are queried.
How it exactly work with only the routin_column being set in CLUSTERED BY I would have to check.

Generally if you are interested in how CrateDB works I can recommend you the excellent articles from @mfussenegger

How CrateDB plans and executes queries
https://zignar.net/2021/05/15/how-cratedb-plans-and-executes-queries/

Distributed SELECT statement execution in CrateDB
https://zignar.net/2021/05/20/distributed-select-statement-execution-in-cratedb/

Query Then Fetch execution in CrateDB
https://zignar.net/2021/05/28/query-then-fetch-execution-in-cratedb/

Group By query execution in CrateDB
https://zignar.net/2021/06/04/group-by-query-execution-in-cratedb/

mfussenegger · September 21, 2021, 2:55pm

Turns out that it didn’t take the clustered by columns into consideration, but will in the future: Optimize routing based on clustered by columns in where clause by mfussenegger · Pull Request #11753 · crate/crate · GitHub

Topic		Replies	Views
Querying Partitions CrateDB	1	393	August 30, 2022
SQL selects retrieve partial data CrateDB sql	2	422	June 5, 2023
Why require the PRIMARY KEY column to be present in a PARTITIONED BY clause? Documentation	1	1964	December 7, 2018
Getting nothing when selecting more than 1 time interval CrateDB	2	497	June 26, 2020
Partioning question CrateDB	3	492	January 16, 2023

Filtering Shards during query with clustering column filter

Related Topics