What are the compression performance and charasteristics compared to TimescaleDB?

proddata · September 9, 2021, 8:32pm

CrateDB by default uses LZ4 as compression for document sources. Further doc values (columnar stores) are delta-encoded, bit-packing and GCD compressed. Tables can also use deflate instead of LZ4 to reduce the storage requirements even further. All that without any “hackery” of combining multiple rows in arrays and using standard encoding techniques like delta-delta-enoding.

Some simple test with some Timescale-provided data sets showed, that CrateDB typically performs better than Timescale without chunk compressing. That being said, that data was rather optimised for delta-delta-encoding and xor-compression.

CREATE TABLE readings (
    time  TIMESTAMP WITH TIME ZONE NOT NULL,
    device_id  TEXT,
    battery_level  DOUBLE PRECISION,
    battery_status  TEXT,
    battery_temperature  DOUBLE PRECISION,
    bssid  TEXT,
    cpu_avg_1min DOUBLE PRECISION,
    cpu_avg_5min DOUBLE PRECISION,
    cpu_avg_15min DOUBLE PRECISION,
    mem_free DOUBLE PRECISION,
    mem_used DOUBLE PRECISION,
    rssi  DOUBLE PRECISION,
    ssid  TEXT
);

Index on device and time for all databases

That being said, with chunk compression enabled Timescale is more efficient (~6GB in this case) for use-cases with very few indexes, but this quickly turns around with higher cardinality data and the use of more indexes. also you can achieve many of the compression characteristics of Timescale by simple using arrays in CrateDB (remember it is mostly a document store). So if you really want to squeeze your data a bit more, most of the timescale “magic” can be achieved by moving data into a second table and using arrays. Also see Optimizing storage for historic time-series data

Another valid option is to use the snapshot mechanism of CrateDB and move old partitions of data further compressed using gzip to a low cost blob-store like S3 or Azure Storage. If you needed that the again, you can restore it with a one-liner

There might arise the question, if the techniques used are so simple, why we just don’t integrate it and the short answer is, that we want to offer our users flexibility how to use CrateDB. The compressions techniques used by timescale limit you what and how you can store it. E.g. JSON data can’t be compressed using those techniques. Further updates, deletes, inserts in compressed chunks are limited.

Nevertheless we ares looking into storage optimisation more and recent results are promising, saving quite a lot of storage without any real performance caveats.

Topic		Replies	Views
How does CrateDB compare to CockroachDB and YugabyteDB? CrateDB	3	1599	November 4, 2021
Migration InfluxDB to CrateDB? (outflux-like) CrateDB	6	836	August 4, 2021
CrateDB as an EDW CrateDB	1	531	January 17, 2022
CrateDB Cloud Throttling CrateDB Cloud	2	321	November 8, 2022
SQL Without LIMIT unoptimized ? doc Schema performance different compared to other schemas? CrateDB	12	855	March 31, 2021

What are the compression performance and charasteristics compared to TimescaleDB?

Related Topics