CrateDB Prometheus Adapter croaks with err="context deadline exceeded"

Hey guys,

So I’m currently running a 3 node CrateDB cluster for Prometheus long term storage and using the CrateDB adapter provided by crateio. I’ve ran into an issue where Prometheus is attempting to POST data to the remote_write (crate adapter) but it’s unable to. Logs are below from CrateDB adapter

time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"

Any help would be great :slight_smile: i’ve tried increasing the scrape_interval on prometheus and it doesn’t seem to help. I was thinking maybe CrateDB is limiting the number of HTTP connections it can receive but can’t really find any documentation to support that

Dear Ryan,

apologies for the very late reply. I answered you recently at Failed to POST/GET data from CrateDB: Croaks with err="context deadline exceeded" · Issue #33 · crate/cratedb-prometheus-adapter · GitHub :

Attaching to that, I believe you might have been tripped by things like Networking robustness and resiliency on Azure and beyond (AWS, GCP, AliCloud) · Issue #10779 · crate/crate · GitHub. In this context, may I ask whether you are running CrateDB and Prometheus within a typical cloud environment or, otherwise, how specifically the cratedb-prometheus-adapter is connected to CrateDB, network-wise?

With kind regards,
Andreas.

Dear Ryan,

we just added a patch to improve the network behaviour slightly by adjusting the TCP timeout and keepalive settings, see Network behaviour: Adjust TCP timeout and keepalive settings by amotl · Pull Request #44 · crate/cratedb-prometheus-adapter · GitHub. Now, those default values are used:

  • TCP keepalive interval: 30 seconds
  • TCP connect timeout: 10 seconds

The new -tcp.connect.timeout command line option can be used to adjust the latter parameter.

With kind regards,
Andreas.

P.S.: We just released CrateDB Prometheus Adapter 0.4.0, which is available in form of release archives [1] and a Docker image [2].

[1] Index of /downloads/dist/prometheus/
[2] https://ghcr.io/crate/cratedb-prometheus-adapter

Hello,

I followed the steps in the tutorial at CrateDB and Prometheus for long-term metrics storage but in the logs for prometheus I’m also receiving this:

any help?

Hi @Florencia_Artegoytia,

can you confirm that all docker containers are running? Is the config file for prometheus-adapter correctly applied? What do the logs output for cratedb-prometheus-adapter container?

Hi. Thank you for the quick response,

Yes, all docker containers are running. I looked into inside the container the file prometheus.yml and I confirmed it was correctly applied.

Here are the logs output for cratedb-prometheus-adapter

Hi @Florencia_Artegoytia,

cratedb-prometheus-adapter looks good.

I just followed the tutorial from scratch and it works perfectely for me locally so I’m a bit puzzeled what doesn’t work on your end.

Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)? Do you start all containers as superuser? Maybe this interferes with the container being able to connect to each other.
How much data do you have in Prometheus?

Error message context deadline exceeded indicates some kind of connection / timeout issue. Maybe the containers can’t see each other. Maybe you transmit too much data and connection times out.

1 Like

Dear Florencia,

thank you for writing in. I deliberately moved your question to this discussion to keep things in line.

After Ryan originally reported this issue, we took some actions to adjust the TCP keepalive interval and the TCP connect timeout settings, also making the latter configurable.

We outlined the improvements at Failed to POST/GET data from CrateDB: Croaks with err="context deadline exceeded" · Issue #33 · crate/cratedb-prometheus-adapter · GitHub, where Ryan also posted the context deadline exceeded error he was observing.

However, Ryan never reported back if those improvements have been helpful in any way. May I humbly ask you about this, @RyanWN4?

@Florencia_Artegoytia: Maybe you can increase the connect_timeout setting in your adapter’s config.yml as outlined within [1], bounce your containers and report back about any improvements you might be able to observe?

With kind regards,
Andreas.


  1. CrateDB and Prometheus for long-term metrics storage ↩︎

Hi again,

Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)?

Yes I’m using the latest version.

Do you start all containers as superuser?

Yes.

I think the problem is, the containers don’t see each other. I tried to ping one of the networks and I get no response. My docker-compose file is the same as the tutorial.

Dear Florencia,

From looking at the console output you’ve shared, I agree. Let’s investigate why that would not work on your machine.

In order to assist the procedure, I’ve created a companion repository to the tutorial at GitHub - crate-workbench/cratedb-prometheus-demo: Companion repository to the »CrateDB and Prometheus for long-term metrics storage« tutorial and, just to be safe, verified it works on both Linux and macOS. The software versions used are:

  • Debian Linux (Ubuntu 20.04.4 LTS)
  • macOS (Catalina 10.15.7)
  • Docker 20.10.14
  • Docker Compose v2.3.3 and v2.4.1

While it shouldn’t make much of a difference, can I humbly ask you to try again using the setup shared within the repository and also share the corresponding software versions of your environment with us?

With kind regards,
Andreas.

Hi,

we are currently testing CrateDB as a long term storage backend to Prometheus. At the moment, we are remote_writing a small subset of metrics via cratedb-prometheus-adapter-0.4.0 into a stand-alone test instance of crate 5.3.4.

After a few weeks, the DB holds about 835M records (130GiB) and so far it does not look too bad.

However, if we try to run a long query via Prometheus’ data explorer, we end up with a time-out after more or less exactly one minute

remote_read: remote server http://adapter:9268/read returned HTTP status 500 Internal Server Error: context deadline exceeded

Prometheus timeout is set to

remote_read:
- url: http://adapter:9268/read
  remote_timeout: 5m
  follow_redirects: true
  enable_http2: true
  filter_external_labels: true

and the adapter’s is connect_timeout: 300

At the moment, I’m unsure how to proceed/where to look for this specific time-out.

Thanks for any pointer

Carsten

PS: If it matters: CrateDB is a bare metal installation with upstream’s Debian packages 5.3.4-1~bookworm and the adapter was just downloaded via github and started with a systemd service file.

1 Like

Dear Carsten,

thank you for writing in, and your excellent report.

After evaluating your observations, and revisiting this topic, I think what is missing in CrateDB Prometheus Adapter, is to be able to properly configure the TCP read timeout. Currently, only the TCP connect timeout is configurable.

It looks like unlocking SetReadDeadline and SetWriteDeadline on the net.Conn object would be right approach for this.

Do you agree?

With kind regards,
Andreas.

Hi Andreas,

from what little I understand, that seems to be the correct way, but given that I’ve never written any go code and and programming beyond simple scripting is not my core strength, I would defer that decision to you and/or other experts.

That being said, I should be able to quickly test any changes if you could provide a test binary (or I will start looking into how to build go binaries from the git repo).

Anyways, thanks a lot for looking into this!

Carsten

1 Like

Thanks for your offer to be a canary in this regard ;]. I will get back to you as soon as there will be something to test. I can’t promise anything for this week, but I will try to get into it next week latest.

1 Like

Well, becoming the canary seems to be the only help I can offer - I’ve quickly browsed net package - net - Go Packages, https://pkg.go.dev/net/http‌ and How to set timeout for http.Get() requests in Golang? - Stack Overflow but I’m not any wiser now how to add this myself to CrateDB Prometheus Adapter.

Thus, I’ll wait patiently :wink:

1 Like

Dear Carsten,

there is no release or pre-release yet, but at least we have been able to modernize the code base a bit. Other than this, there are two patches specifically addressing performance topics which are probably relevant for you and @RyanWN4.

They originate from both your reports, and other investigations on behalf of reports from others, documented at Investigate how to scale out this service · Issue #72 · crate/cratedb-prometheus-adapter · GitHub. Thanks, @hernanc.

I think both patches are crucial to improve the production-readyness of the application.

We hope to be able to ship a release with those major improvements next week, maybe adding a few more related details.

Have a good weekend, and with kind regards,
Andreas.

Dear Carsten,

we addressed a few of the most prominent performance issues with the most recent release, CrateDB Prometheus Adapter 0.5.0, see also the release notes.

If you still want to give it a try, we will be happy to hear about your feedback, and if the improvements will resolve the issues you have been running into. Thank you very much!

With kind regards,
Andreas.