What are the basic optimization techniques/parameters for a time series table in this example?

Hello,

I’m aware of Optimizing storage for historic time-series data, but in this specific question, I’m not interested data retention or moving old (cold) data to another table, etc.

I want to understand better what kind of basic optimizations/techniques should be used when creating a time series table with the following structure, stored on a 3-node CreateDB cluster, each node with 64 GB RAM, 16 CPU cores and 500 GB storage:

CREATE TABLE IF NOT EXISTS emre.test_table_001 (
  time timestamptz NOT NULL,
  location_id text NOT NULL,
  device_id text NOT NULL, 
  sensor_reading_01 double precision,
  sensor_reading_02 double precision,
  ...
  sensor_reading_50 double precision,
);

Combination of (time, location_id, device_id) is unique.

Let’s assume that:

  • this table stores sensor readings from various devices identified by device_id, residing at a geographical location identified by a location_id.
  • the sensor data is sent to a .NET application, and this .NET application INSERTs data into this table every few seconds.
  • other .NET applications SELECT from this table to show various dashboards to users, whenever users request these via a web application.
  • a user is generally interested in the time series data for 1 or more devices in a single location in a given web page request.
  • currently there are about ~80 devices scattered to ~50 different locations, sending sensor readings every second.
  • hot data’ should be available for about 3 months, meaning, currently hot data is about 100 million rows, but as new locations and devices are added, the amount of ‘hot data’ is estimated to be on the order a few billion rows.

Based on these assumptions and the ‘naive’ table CREATE command above, what would CrateDB experts say, with respect to the following topics and table parameters: