CrateDB/postgres_fdw/LIMIT parameter

Hey all,

I am very new to CrateDB. We have previously been using Promscale and TimescaleDB for remote storage of Prometheus metrics, but Promscale is now deprecated and unsupported, meaning we need to look at other options.

CrateDB seems really nice, and well suited for our needs as we maintain our “non time-series” relational data in Postgres anyway, so the support for Postgres protocols is very appealing.

I have managed to get Crate ingesting metrics from Prometheus without issue, and can manually use psycopg2 to connector, get cursor, and query as you would expect. The multi-dimensional labels are stored as a simple key/value object within CrateDB.

A fairly major limitation is that our reporting engine is limited in as much as we cannot do cross-database joins - e.g. our non time-series data to the metric data now stored in CrateDB.

To try and get around this, I have experimented with postgres_fdw and added Crate as a remote server, and actually had success creating a remote table mapping from our standard Postgres to the CrateDB metrics table using jsonb as the field type in Postgres to represent the label object column in Crate. This seems to query just fine in normal Postgres client/connection:

create foreign table doc.recent_metrics (timestamp timestamptz, labels_hash text, labels jsonb, value double precision) server crate;
select * from doc.metrics where timestamp > '2023-03-18 21:40';
2023-03-18 21:45:06.031+00 | 8bbaad138daa6a0e | {"app": "APP1", "job": "SNMP_RaritanPX", "type": "snmp", "__name__": "scrape_samples_scraped", "instance": "snmp_raritan_simulator", "config_name": "Raritan SNMP"}

However, things fall down when applying a simple limit clause to a query:

select * from doc.metrics limit 1;
ERROR:  line 2:72: mismatched input '::' expecting {<EOF>, ';'}
remote SQL command: SELECT "timestamp", labels_hash, labels, value FROM doc.metrics LIMIT 1::bigint

It seems as though Postgres is forcibly casting the limit parameter to a type before remotely querying Crate, which clearly CrateDB doesn’t like.

This seems like such a minor issue in the scheme of things - I wondered if anybody else had experienced anything like this and could advise?

Thanks in advance,
Ian

1 Like

Hi Ian,
I just tested this with:

CREATE EXTENSION postgres_fdw;

CREATE SERVER remotecratedb 
	FOREIGN DATA WRAPPER postgres_fdw 
	OPTIONS (host '192.168.149.97',port '5432',dbname 'postgres');

CREATE USER MAPPING
	FOR CURRENT_USER 
	SERVER remotecratedb 
	OPTIONS(user 'crate', password '');

using DBeaver to connect to PostgreSQL and version 5.2.2 of CrateDB behind

and LIMIT is working correctly for me without issues.

One thing that I did differently is that your CREATE FOREIGN TABLE snippet uses doc.recent_metrics but later you query doc.metrics, I used doc.metrics consistently.
If that does not help maybe you could share your CREATE SERVER command, the version of PostgreSQL you are running, and how your are running the command against the PostgreSQL instance?

Thank you.

1 Like

Hi,

Thanks for the prompt response - here are the commands I have issued on Postgres. I’m running in a dockerised setup, so “cratedb” is the crate hostname, and “db” is the postgres hostname:

select version()

PostgreSQL 14.6 (Ubuntu 14.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit

create server if not exists crate foreign data wrapper postgres_fdw options (host ‘cratedb’);
create user mapping for CURRENT_USER SERVER crate OPTIONS (user ‘crate’);

The recent_metrics/metrics are the same thing (recent is just a quick view I set up in crateDB to test), but limit fails on both.

I note that you used ‘postgres’ in the db name for the foreign data wrapper command, is that required for compatibility mode?

Thanks again,
Ian

By means of an update, I upgrade to PostgreSQL 15, started from a blank DB and issued the following commands:

create schema doc;
create server crate foreign data wrapper postgres_fdw options (host ‘cratedb’, dbname ‘postgres’);
create user mapping for current_user server crate options (user ‘crate’, password ‘’);
create foreign table doc.metrics (timestamp timestamptz, labels_hash text, labels jsonb, value double precision) server crate;

SELECT * FROM doc.metrics LIMIT 1;

I receive the same error:

remote SQL command: SELECT “timestamp”, labels_hash, labels, value FROM public.metrics LIMIT 1::bigint
ERROR: line 2:82: mismatched input ‘::’ expecting {, ‘;’}
CONTEXT: io.crate.exceptions.SQLExceptions.esToCrateException(SQLExceptions.java:162)

I am using crate from the latest docker image, so a 5.2 version.

Could I ask what version of PostgreSQL you have success with?

Thanks,
Ian

Only me again :slight_smile:

Further update on this - when querying through Python/PsycoPG/Django the limit works fine as the library does not appear to be casting LIMIT to bigint.

When querying through the psql command line OR the latest version of metabase, the issue occurs. The latter is the one I really would like to get working for testing as everything else seems really positive with regards to CrateDB.

I assume the casting is to do with the underlying Postgres client library version used by the different tools. I will see if there are any connection options that could potentially affect this also.

Okay, further investigation seems that maybe this isn’t version related at all:

The following query:

select * from doc.metrics_by_name limit 10;

Causes the mismatch error:

ERROR:  line 2:81: mismatched input '::' expecting {<EOF>, ';'}
CONTEXT:  io.crate.exceptions.SQLExceptions.esToCrateException(SQLExceptions.java:162)

However, passing a WHERE clause seems to work fine:

select * from doc.metrics_by_name WHERE metric_name='node_cpu_seconds_total' limit 100;
       metric_name       |                                                                                                   labels                                                        
                                           |         timestamp          |  value
------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------
 node_cpu_seconds_total | {"app": "APP1", "cpu": "0", "job": "test_nodes", "mode": "irq", "type": "node_exporter", "__name__": "node_cpu_seconds_total", "instance": "node_exporter_2:9100", "config_name": "Lab Node Exporter"}     | 2023-03-18 14:58:53.128+00 |   16.84

Looks like this may be a minor query parsing bug?

Okay, even more strangeness - LIMIT queries only work if there is a WHERE clause applied, and the WHERE field is a string (or at least dates alone fail):

Working:

select * from doc.metrics_by_name WHERE metric_name=‘node_cpu_seconds_total’ limit 100;
select * from doc.metrics_by_name WHERE metric_name=‘node_cpu_seconds_total’ and timestamp > ‘2023-03-18’ limit 100;

Not Working:

select * from doc.metrics_by_name WHERE timestamp > ‘2023-03-18’ limit 100;
select * from doc.metrics_by_name limit 100;

Annoyingly I can’t seem to get postgres_fdw to send the actual query it sends.

Do you know what kind of query ends up in CrateDB?
You can check in the sys.jobs_log table.

Generally speaking CrateDB would cast date strings to timestamps implicitly.

Thanks for the response - nice feature. This is what is shown in logs for the error:

select * FROM doc.metrics_by_name LIMIT 10;

And here is a successful query:

select * from doc.metrics_by_name where metric_name = ‘node_cpu_seconds_total’ limit 10;

I note that there’s no limit clause at all on the “success” query, and it fetches forward more than the 10 rows requested.

I obtained these logs with the following Crate query, so let me know if there is anything else you need me to run for more details.

select * from sys.jobs_log WHERE username=‘xxx’ AND error [IS|IS NOT] NULL ORDER BY started desc;

Thanks,
Ian

Had to attach the “success” response screen shot separately as I am a new user:

Hi again Ian,
Apologies for the delay coming back to you.

I note that you used ‘postgres’ in the db name for the foreign data wrapper command, is that required for compatibility mode?

this does not change the result, we can have multiple schemas, but there is only one database in CrateDB, it is currently possible to use either “postgres” or “crate” as database name when connecting with a PostgreSQL client.

I note that there’s no limit clause at all on the “success” query, and it fetches forward more than the 10 rows requested.

this can be controlled passing fetch_size under OPTIONS in the CREATE SERVER statement, for instance

OPTIONS (host 'cratedbcontainer',port '5432',dbname 'postgres', fetch_size '70');

Could I ask what version of PostgreSQL you have success with?

I have done a few tests and I believe the determining factor is the version of PostgreSQL:

PostgreSQL version Result
10.22 working
11.19 working
12.14 not working
15.2 not working

Please note 11.19 is still a supported version so this could be an alternative.
I have raised a feature request
Support casting in LIMIT clauses for compatibility with postgres_fdw (12+) · Issue #13835 · crate/crate (github.com)
so that CrateDB can accept this syntax in the LIMIT clause.

I hope this helps.

2 Likes

Thank you hernanc for the testing, I may well test with 11 for the time being.

I don’t know if you saw my other posts about the syntax changing, and being acceptable, when strings are used in WHERE clauses and so on. I am not sure if you would like me to comment on the feature request or whether you would add this information.

E.g. the following query works just fine in postgres 15:

select * from doc.metrics_by_name WHERE metric_name=‘node_cpu_seconds_total’ limit 100;

Hi,
Yes, I read your messages, and please feel free to comment on the feature request, but my thinking was that while it was possible to get it to avoid the cast in the LIMIT clause tweaking the query the only way for this to work in all cases, in particular with tools where you have no control on the query, would be for CrateDB to support the syntax with the cast in the LIMIT.

1 Like

Thanks for the support hernanc and @proddata.

Agreed this would be the best case, and hopefully isn’t too huge a feature to implement.

Hi, just a quick update on this,
Add CAST support for LIMIT and OFFSET by matriv · Pull Request #13954 · crate/crate (github.com)
was merged into master last week, but it did not arrive on time to make it for 5.3 so it will be included in the next feature release (5.4).

2 Likes