Migrating from PostgreSQL/Timescale to CrateDB

This article describes how to migrate tables from PostgreSQL (using the Timescale extension) to CrateDB.

Prerequisites

  1. A running Timescale/PostgreSQL server, following one of the options outlined in the Timescale documentation.
  2. In this example, we use the “devices” sample dataset from the Timescale documentation and follow the psql calls as described to create two tables and populate them.
  3. The target tables must exist on CrateDB. In this example, we can re-use the original CREATE TABLE statement.

Export

You can export PostgreSQL tables as CSV or JSON. CSV files will be quicker to generate and take much less disk space. However, the capabilities of CSV files are limited when it comes to more complex table structures, such as JSONB or array columns. If you need more flexibility, choose the JSON export variant.

Particularity for handling timestamp columns

When importing timestamps from a CSV/JSON file into CrateDB, timestamps must be represented as UNIX timestamps with milliseconds precision. This typically requires a transformation step, which is why the export methods described below will differentiate between tables with and without timestamp columns.

Using CSV

  1. Connect to PostgreSQL: $ psql -U <PostgreSQL user> -d <database> -h <host>
  2. Export the source table: In our case, the table has a timestamp column, which is transformed into a UNIX timestamp before generating the CSV: demo=# \copy (SELECT EXTRACT(epoch FROM "time") * 1000 AS "time", device_id, battery_level, [...] FROM readings) TO '~/device_readings.csv' WITH (FORMAT 'csv', DELIMITER ',', QUOTE '"', ENCODING 'UTF-8', HEADER true).
    If your table does not include a timestamp column, you can use a simple SELECT * instead of listing columns explicitly.

Using JSON

  1. Connect to PostgreSQL: $ psql -U postgres -d demo -h 127.0.0.1

  2. There are two ways to proceed, depending on the table layout.

  3. If your table does not contain any timestamp columns, you can go the easy way and export rows using ROW_TO_JSON:

    demo=# \copy (SELECT ROW_TO_JSON(i) FROM readings i) TO '~/device_readings.json' WITH (FORMAT 'text')
    

    Each column will be represented by its original name and value without any transformations.

  4. If your table contains timestamp columns or you wish to transform the values, you need to create the JSON object manually.
    The JSON_BUILD_OBJECT functions takes a variadic list of arguments, alternating as key/value for each column (consult the PostgreSQL documentation for more details).

    demo=# \copy (SELECT JSON_BUILD_OBJECT('time', EXTRACT(epoch FROM "time") * 1000, 'device_id', device_id, 'battery_level', battery_level, [...]) FROM readings) TO '~/device_readings.json' WITH (FORMAT 'text')
    

    The generated JSON will look like this: {"time" : 1479211200000, "device_id" : "demo000000", "battery_level" : 96, [...]}

Arrays/JSONB

Let’s look at a more complex source table definition, e.g. involving arrays and JSONB columns:

CREATE TABLE readings ("time" TIMESTAMP, memory FLOAT[], cpu JSONB);
INSERT INTO readings VALUES ('2016-11-15T12:00:00+00:00', '{650609585, 349390415}', '{"cpu_avg_1min": 5.26, "cpu_avg_5min": 6.172, "cpu_avg_15min": 6.51066666666667}');

Both the array and JSONB column can be included in the generated JSON document by specifying a corresponding key and the column name:

demo=# SELECT JSON_BUILD_OBJECT('time', EXTRACT(epoch FROM "time") * 1000, 'memory', memory, 'cpu', cpu) FROM readings;

The output looks as follows:

{"time" : 1479211200000, "memory" : [650609585,349390415], "cpu" : {"cpu_avg_1min": 5.26, "cpu_avg_5min": 6.172, "cpu_avg_15min": 6.51066666666667}}

In CrateDB, the target table can be defined as follows:

CREATE TABLE readings (
  "time" TIMESTAMP,
  memory INTEGER[],
  cpu OBJECT AS (
    cpu_avg_1min FLOAT,
    cpu_avg_5min FLOAT,
    cpu_avg_15min FLOAT
  )
);

Geo Points

Another example that requires transformations is geo points:

demo=# CREATE TABLE points (p point);
demo=# INSERT INTO points VALUES ((1,2));

ROW_TO_JSON will serialize the record as {"p":"(1,2)"} which cannot be parsed by CrateDB. Instead, we use JSON_BUILD_ARRAY to transform the point into an array:

SELECT JSON_BUILD_OBJECT('p', JSON_BUILD_ARRAY(p[0], p[1])) FROM geo i;

The generated JSON will now be an array that is understood by CrateDB for importing: {"p" : [1, 2]}

Importing into CrateDB

  1. Make the export available to CrateDB via one of the schemes supported by COPY.
  2. Using crash, import the CSV dump to CrateDB: cr> COPY readings FROM '<URI to export file>' RETURN SUMMARY;

Further readings

If your PostgreSQL instance is hosted on AWS RDS, you might want to check out the capability to export directly to S3.

2 Likes