Auto increment primary key to identify rows insertion order

Alin_Mihut · March 20, 2023, 7:14am

Hello guys,

CrateDB offers lots of options in terms of auto generated columns, system fields, text based row identifiers. This is great, but I didn’t find in the documentation how to identify the rows based on the insertion order.

The system field _ID is of type text, and if you query a table with ORDER BY _ID DESC you don’t get the oldest to newest rows inserted.

My use case is quite complex. Basically I have a cluster with x nodes that ingest data into CrateDB at it’s own pace. So I can’t relay on a timestamp field to paginate and sort the results from newest to oldest or oldest to newest because each node can insert multiple rows with the same timestamp. I referring to a cursor based pagination with next (pointer field value < cursor value) and previous (pointer field value > cursor value) conditions.

In the use case I presented above, caching the row ID on the client side, and increment it for each row in the batch insert request to CrateDB, works if the cluster has only one node active.

My question is:

It is possible to use something similar to SERIAL type in PostgreSQL when creating the tables?

Maybe use auto generated columns with auto increment capabilities, create a sequence on the server side to generate the next row ID, or use a trigger.

Any feedback, ideas, or alternatives to achieve this is highly appreciated.

Thank you

proddata · March 20, 2023, 1:43pm

CrateDB - as a highly scalable database designed to handle hundreds of nodes - does not support a global auto-increment or serial data type. This is because such a feature would necessitate synchronization among all nodes during data ingestion. The strength of CrateDB lies in its ability to allow nodes, and even individual shards within nodes, to operate largely independently and concurrently. A global counter would hinder this parallel processing, undermining the core advantages of CrateDB.

You could maybe use a generated timestamp field in CrateDB on a table level, like

CREATE TABLE t01 (
  my_val DOUBLE,
  insert_ts TIMESTAMP GENERATED ALWAYS AS now()
);

Topic		Replies	Views
HTTP connection - caching, or flush data? Inconsistencies CrateDB	1	58	March 30, 2024
Table creation considerations CrateDB	7	454	July 14, 2022
Retrieving records in bulk with a list of primary key values Tutorials sql	0	309	February 23, 2024
Guide to efficient data ingestion to CrateDB with pandas Tutorials integration	0	1070	July 10, 2023
Designing schema to avoid having to use REFRESH TABLE CrateDB	1	588	June 8, 2020

Auto increment primary key to identify rows insertion order

Related Topics