What would be the basic optimization techniques/parameters for a time series table in this example?

Emre_Sevinc · October 8, 2021, 12:16pm

Hello,

I’m aware of Optimizing storage for historic time-series data, but in this specific question, I’m not interested data retention or moving old (cold) data to another table, etc.

I want to understand better what kind of basic optimizations/techniques should be used when creating a time series table with the following structure, stored on a 3-node CreateDB cluster, each node with 64 GB RAM , 16 CPU cores and 500 GB storage:

CREATE TABLE IF NOT EXISTS emre.test_table_001 (
  time timestamptz NOT NULL,
  location_id text NOT NULL,
  device_id text NOT NULL, 
  sensor_reading_01 double precision,
  sensor_reading_02 double precision,
  ...
  sensor_reading_50 double precision,
);

Combination of (time, location_id, device_id) is unique .

Let’s assume that:

this table stores sensor readings from various devices identified by device_id , residing at a geographical location identified by a location_id .
the sensor data is sent to a .NET application, and this .NET application INSERTs data into this table every few seconds.
other .NET applications SELECT from this table to show various dashboards to users, whenever users request these via a web application.
a user is generally interested in the time series data for 1 or more devices in a single location in a given web page request.
currently there are about ~80 devices scattered to ~50 different locations, sending sensor readings every second.
‘ hot data ’ should be available for about 3 months, meaning, currently hot data is about 100 million rows , but as new locations and devices are added, the amount of ‘ hot data ’ is estimated to be on the order a few billion rows .

Based on these assumptions and the ‘naive’ table CREATE command above, what would CrateDB experts say, with respect to the following topics and table parameters:

Indexing: as far as I understand CrateDB creates indexes automatically, but coming from a PostgreSQL background, I wonder if I should also create some other indexes manually based on the conditions specified in WHERE queries in .NET applications?
CLUSTERED parameter as described in CREATE TABLE — CrateDB: Reference
PARTITIONED BY parameter as described in CREATE TABLE — CrateDB: Reference
number-of-routing-shards, as described in CREATE TABLE — CrateDB: Reference

proddata · October 14, 2021, 2:59pm

Typically one should aim for a single shard size of about 10-50 GiB
Partitions can be deleted/dropped like a table. i.e. if you want to delete old data quickly, you should partition by e.g. month
typically having less shards is more performant, than having too many

@jayeff wrote a basic guide for time series data

also some core information on sharding:

Topic		Replies	Views
What are the basic optimization techniques/parameters for a time series table in this example? CrateDB	0	559	October 7, 2021
Optimizing storage for historic time-series data Tutorials performance , data-storage	10	2899	June 22, 2022
Starting out with cratedb CrateDB	5	914	May 31, 2022
Sharding and partitioning guide for time-series data Tutorials sql , fundamentals , getting-started , performance	0	4079	July 2, 2021
Building a data retention policy for CrateDB with Apache Airflow Tutorials integration , orchestration	0	1152	February 2, 2022

What would be the basic optimization techniques/parameters for a time series table in this example?

Related Topics