We are considering to migrate to CrateDB from Postgres as data that we are using are from real-time sensors with timestamps and geospatial information. Crate seems very promising as, according to your homepage, it is very well suited and fast for use cases related to querying timeseries/geo data.
However, at first, I wanted to check performance difference between these two DBs. I’ve imported table with 9.5 millions of rows from Postgres, which contains data from one sensor with timestamps.
Create statement in crateDB:
create table sensor_table( id BIGINT, category varchar, category_id integer, speed float, entry_zone text, exit_zone text, ts timestamp, date_timestamp text, week GENERATED ALWAYS AS date_trunc('week',ts) ) PARTITIONED BY(week);
Create statement in Postgres:
CREATE TABLE IF NOT EXISTS public.sensor_table ( id integer NOT NULL, category character varying COLLATE pg_catalog."default", category_id integer, speed numeric, entry_zone text COLLATE pg_catalog."default", exit_zone text COLLATE pg_catalog."default", ts timestamp without time zone, date_timestamp date, CONSTRAINT sensor_table_pkey PRIMARY KEY (id) )
Then, I tried simple SELECT queries both on Crate and Postgres, to select all data by week/month/any data ranges, e.g.:
select * from sensor_table where ts >= '2021-08-20' and ts < '2021-09-20' limit 10000000;
Here it becomes interesting - in Crate (Admin UI console), this query takes ~7 seconds, while in Postgres (pgAdmin4) only ~2.4 seconds.
Crate vs. Postgres query speed:
Both queries were done on the same datasets, on the same local machine, on Windows OS. Only difference is that Crate in addition to other columns contains generated column
week, by which table is partitioned (without partitioning, this query took more than 30 seconds btw.). Partitioning by month was slower compared to partition by week.
Postgres table contains no partitions nor indexes. No changes in the configuration after local installation were done.
In Crate, I was trying querying with heap size of 2gb, 4gb, then clustering into 2 shards and 4 shards - results were pretty same in every scenario.
So my question is - why is this happening? Is there some major configuration-related thing I forgot to set? Is this somehow related to OS? Do I have to query/create table in different way? Or do I need to use multi-node cluster to actually see the difference?
How to achieve results where it can be clearly seen that Crate outperforms Postgres (from the perspective of querying)?
As you can imagine, query speed is crucial for our use cases, so it must be at least as efficient as in postgres.
Thank you very much for any reaction.