CrateDB as an EDW

kiranbm · January 13, 2022, 1:15pm

Can we use CrateDB as an enterprise data warehouse supporting traditional OLAP and BI workloads? My understanding is that its MPP SQL capabilities (though built on a NoSQL Lucene base) makes it a fit (technically at least).

As you know there are no good open-source parallel SQL databases out there, and conventional enterprise data platforms offered by IBM, Teradata and Oracle don’t look good anymore (for a variety of reasons), why not use CrateDB?

A client of mine is on the look out for alternative MPP SQL platforms to migrate out of Teradata.

hammerhead · January 17, 2022, 1:47pm

Hi @kiranbm,

we talked about this already outside of the community last week, but let me summarize a few points from our conversation here for the benefit of the community.

Although CrateDB historically comes from an IoT/manufacturing background, it is still a generic database product that is not limited to a single vertical. We see a wide range of analytics use cases enabled by CrateDB, such as with marketing or financial data. The benefits of high ingest performance, horizontal scaling and performant analytics queries are relevant there as well.

The shared nothing architecture enables a lot of these benefits, but also has its implications. The distributiveness of data makes joins more expensive, as data needs to be transported between nodes. While this isn’t a problem per-se for a small number of joins with (typically rather small) dimension tables, it can become considerable in data warehousing use cases with large, hierarchical datamodels involving ten or more joins. Replicating dimension tables on all nodes removes the performance problem, but comes at the cost of redundant storage.

The fact that CrateDB doesn’t support transactions can become relevant when dealing with ETL loads populating dimension tables. A complete reload (i.e. deletion of previous rows and inserting of new ones) of a tables’ data cannot be encapsulated in a transaction to remain invisible to other clients. There are strategies to work around this, such as adding a version number for each load and always using in queries the latest version. Once the load for a table has finished successfully, the previous version can safely be deleted.

That being said, the bottom line is that CrateDB is suitable for a lot more than just IoT, it needs to be evaluated on a per-use-case basis if there is a fit with CrateDB’s architecture.

Best

Niklas

Topic		Replies	Views
Replicating data to CrateDB with Debezium and Kafka Tutorials integration , data-ingestion	0	1638	February 28, 2023
How does CrateDB compare to CockroachDB and YugabyteDB? CrateDB	3	1587	November 4, 2021
Migration InfluxDB to CrateDB? (outflux-like) CrateDB	6	832	August 4, 2021
Inserting billions of rows the hard way CrateDB	15	1813	April 6, 2021
Updating data performance CrateDB	0	152	October 31, 2023

CrateDB as an EDW

Related Topics