CrateDB Prometheus Adapter croaks with err="context deadline exceeded"

Hey guys,

So I’m currently running a 3 node CrateDB cluster for Prometheus long term storage and using the CrateDB adapter provided by crateio. I’ve ran into an issue where Prometheus is attempting to POST data to the remote_write (crate adapter) but it’s unable to. Logs are below from CrateDB adapter

time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"

Any help would be great :slight_smile: i’ve tried increasing the scrape_interval on prometheus and it doesn’t seem to help. I was thinking maybe CrateDB is limiting the number of HTTP connections it can receive but can’t really find any documentation to support that

Dear Ryan,

apologies for the very late reply. I answered you recently at Failed to POST/GET data from CrateDB · Issue #33 · crate/cratedb-prometheus-adapter · GitHub :

Attaching to that, I believe you might have been tripped by things like Networking robustness and resiliency on Azure and beyond (AWS, GCP, AliCloud) · Issue #10779 · crate/crate · GitHub. In this context, may I ask whether you are running CrateDB and Prometheus within a typical cloud environment or, otherwise, how specifically the cratedb-prometheus-adapter is connected to CrateDB, network-wise?

With kind regards,
Andreas.

Dear Ryan,

we just added a patch to improve the network behaviour slightly by adjusting the TCP timeout and keepalive settings, see Network behaviour: Adjust TCP timeout and keepalive settings by amotl · Pull Request #44 · crate/cratedb-prometheus-adapter · GitHub. Now, those default values are used:

  • TCP keepalive interval: 30 seconds
  • TCP connect timeout: 10 seconds

The new -tcp.connect.timeout command line option can be used to adjust the latter parameter.

With kind regards,
Andreas.

P.S.: We just released CrateDB Prometheus Adapter 0.4.0, which is available in form of release archives [1] and a Docker image [2].

[1] Index of /downloads/dist/prometheus/
[2] https://ghcr.io/crate/cratedb-prometheus-adapter

Hello,

I followed the steps in the tutorial at CrateDB and Prometheus for long-term metrics storage but in the logs for prometheus I’m also receiving this:

any help?

Hi @Florencia_Artegoytia,

can you confirm that all docker containers are running? Is the config file for prometheus-adapter correctly applied? What do the logs output for cratedb-prometheus-adapter container?

Hi. Thank you for the quick response,

Yes, all docker containers are running. I looked into inside the container the file prometheus.yml and I confirmed it was correctly applied.

Here are the logs output for cratedb-prometheus-adapter

Hi @Florencia_Artegoytia,

cratedb-prometheus-adapter looks good.

I just followed the tutorial from scratch and it works perfectely for me locally so I’m a bit puzzeled what doesn’t work on your end.

Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)? Do you start all containers as superuser? Maybe this interferes with the container being able to connect to each other.
How much data do you have in Prometheus?

Error message context deadline exceeded indicates some kind of connection / timeout issue. Maybe the containers can’t see each other. Maybe you transmit too much data and connection times out.

Dear Florencia,

thank you for writing in. I deliberately moved your question to this discussion to keep things in line.

After Ryan originally reported this issue, we took some actions to adjust the TCP keepalive interval and the TCP connect timeout settings, also making the latter configurable.

We outlined the improvements at Failed to POST/GET data from CrateDB: Croaks with err="context deadline exceeded" · Issue #33 · crate/cratedb-prometheus-adapter · GitHub, where Ryan also posted the context deadline exceeded error he was observing.

However, Ryan never reported back if those improvements have been helpful in any way. May I humbly ask you about this, @RyanWN4?

@Florencia_Artegoytia: Maybe you can increase the connect_timeout setting in your adapter’s config.yml as outlined within [1], bounce your containers and report back about any improvements you might be able to observe?

With kind regards,
Andreas.


  1. CrateDB and Prometheus for long-term metrics storage ↩︎

Hi again,

Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)?

Yes I’m using the latest version.

Do you start all containers as superuser?

Yes.

I think the problem is, the containers don’t see each other. I tried to ping one of the networks and I get no response. My docker-compose file is the same as the tutorial.

Dear Florencia,

From looking at the console output you’ve shared, I agree. Let’s investigate why that would not work on your machine.

In order to assist the procedure, I’ve created a companion repository to the tutorial at GitHub - crate-workbench/cratedb-prometheus-demo: Companion repository to the »CrateDB and Prometheus for long-term metrics storage« tutorial and, just to be safe, verified it works on both Linux and macOS. The software versions used are:

  • Debian Linux (Ubuntu 20.04.4 LTS)
  • macOS (Catalina 10.15.7)
  • Docker 20.10.14
  • Docker Compose v2.3.3 and v2.4.1

While it shouldn’t make much of a difference, can I humbly ask you to try again using the setup shared within the repository and also share the corresponding software versions of your environment with us?

With kind regards,
Andreas.