Connecting to CrateDB using Trino

Trino (formerly known as Presto SQL) is a distributed query engine, allowing to run analytical queries across different data sources via SQL. One of those data sources can be CrateDB and this article is going to look at how to configure the connection.

Prerequisites

We assume a Trino client/server installation is already in place as per Trino’s installation instructions.

For this post, I installed Trino on macOS using Homebrew with brew install trino and my installation directory is /usr/local/Cellar/trino/368. Depending on your installation method, there might be different ways to start the Trino server. For the sake of this post, I start it in my console from the installation directory with the command ./bin/trino-server run. Your preferred way of starting might differ.

Connector configuration

Due to CrateDB’s PostgreSQL protocol compatibility, we can make use of Trino’s PostgreSQL connector. Create a new file /usr/local/Cellar/trino/368/libexec/etc/catalog/postgresql.properties to configure the connection:

connector.name=postgresql
connection-url=jdbc:postgresql://<CrateDB hostname>:5432/
connection-user=<CrateDB username>
connection-password=<CrateDB password>
insert.non-transactional-insert.enabled=true

Please replace the placeholders for the CrateDB hostname, username, and password to match your setup. Besides the connection details, the configuration has two particularities:

  • No database name: With PostgreSQL, a JDBC connection URL usually ends with a database name. We intentionally omit the database name when connecting to CrateDB for compatibility reasons.
    CrateDB consists of a single database with multiple schemas, hence we do not specify a database name in the connection-url. If a database name is specified, you will run into an error message on certain operations (ERROR: Table with more than 2 QualifiedName parts is not supported. Only <schema>.<tableName> works).
  • Disabling transactions: Being a database with eventual consistency, CrateDB doesn’t support transactions. By default, the PostgreSQL connector will wrap INSERT queries into transactions and attempt to create a temporary table. We disable this behavior with the insert.non-transactional-insert.enabled parameter.

Running queries against CrateDB

Once the PostgreSQL connector is configured, we can connect to the Trino server using its CLI:

# schema refers to an existing CrateDB schema
$ ./bin/trino --catalog postgresql --schema doc
trino:doc>

A SHOW TABLES query should successfully list all existing tables in the specified CrateDB schema and you can proceed with querying them.

As CrateDB differs in some aspects from PostgreSQL, there are a few particularities to consider for your queries:

  • Querying OBJECT columns: Columns of the data type OBJECT can usually be queried using the bracket notation, e.g. SELECT my_object_column['my_object_key'] FROM my_table. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such as SELECT "my_object_column['my_object_key']" FROM my_table.
  • Inserting into OBJECT columns: Inserting rows into an OBJECT column can usually be done by passing the JSON payload as a string, like INSERT INTO my_table VALUES ('{"a": 1}'). This currently fails with Trino as it can’t interpret the OBJECT data type. A solution is expected with the upcoming CrateDB 4.7 release which adds the JSON data type as an alias. We will revisit this aspect once released.
  • Creating tables: Not all of Trino’s data types can be mapped to CrateDB data types as part of a CREATE TABLE statement. If you run into data type mapping issues, we recommend creating tables directly in CrateDB and using Trino for querying/inserting data.

Conclusion

With a few parameter tweaks, Trino can successfully connect to CrateDB. The information presented in this post is the result of a short compatibility test and is likely not exhaustive. If you use Trino with CrateDB and are aware of any additional aspects, please let us know!

5 Likes