Monitoring CrateDB Cloud clusters

This tutorial demonstrates how you can monitor your CrateDB Cloud cluster using the exposed Prometheus metrics.

The visualization tool Grafana is used with Prometheus to scrape the API endpoint that exposes metrics and visualize them. The returned metrics are a sum of all the clusters in the specified organization.

Prerequisites

  • Both Prometheus and Grafana are run as Docker containers in this tutorial, so you need Docker present in your system.

Cluster Deployment

The first step is to sign up in the Cloud Console if you haven’t done so yet. After that, you can deploy your cluster.

Prometheus

Prometheus is used to scrape the CrateDB Cloud API endpoints for available metrics and serve as a data source for Grafana.

First, you need to save the following configuration .yaml file in your system:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: "cratedb"
    metrics_path: '/api/v2/organizations/{{ORGID}}/metrics/prometheus/'
    basic_auth:
      username: '{{APIKEY}}'
      password: '{{SECRET}}'
    static_configs:
      - targets: ["console.cratedb.cloud"]

Substitute the ORGID with the ID of your organization. It can be found in the Settings page in the CrateDB Cloud Console:

Your API credentials can be found on the Account page. Make sure to store your API secret securely, as it’s shown only once when you create your API key.

Once you have added your Organization ID and API credentials, execute the following command to create a Prometheus instance:

docker run -d --name prometheus -v /Users/crate/prometheus.yml:/etc/prometheus/prometheus.yml -p 9090:9090 prom/prometheus

It’s important to use the full path to the .yml file in the docker run command

This will start the Prometheus instance exposed on port 9090 . You can verify it’s running correctly by visiting http://localhost:9090/ . On the Status -> Targets page in the top menu, you should see the following:

There should be an endpoint with your Organization ID with state UP . This means that Prometheus is able to connect to the API and is scraping the available metrics.

Available Metrics

Most metric semantics are self-explanatory. This list is not exhaustive, and new metrics can be added at any point in the future. All metrics are per node.

Metric Type Description
container_cpu_usage_seconds_total Counter CrateDB CPU usage, in seconds.
container_fs_reads_bytes_total Counter Number of bytes read per disk
container_fs_writes_bytes_total Counter Number of bytes written per disk
container_memory_usage_bytes Gauge Memory usage
container_network_receive_bytes_total Counter Network ingress traffic
container_network_transmit_bytes_total Counter Network egress traffic
crate_circuitbreakers Gauge Circuit breaker stats for crate per breaker
crate_cluster_state_version Gauge Info about the cluster’s state
crate_connections Gauge Number of connections per protocol
crate_node Gauge Shard statistics
crate_query_failed_count Counter Number of failed queries per type (i.e. Insert/Select/Update/…)
crate_query_sum_of_durations_millis Counter Sum of the durations of all queries per query type
crate_query_total_count Counter Total number of queries per type
crate_ready Gauge An indicator if this CrateDB node is up-and-running
crate_threadpools Gauge Thread pool statistics, per pool
jvm_* Gauge Various JVM statistics

Grafana

Grafana doesn’t need any special configuration. You can run it either in a Docker container or as a local installation, it doesn’t matter for this use case. Follow the Grafana documentation and use your preferred method.

We used Docker image to run grafana:

docker run -d -p 3000:3000 grafana/grafana-oss

By default, Grafana is exposed on port 3000. Go to http://localhost:3000/ to access it.

Data source

Now you can add Prometheus as a data source in Grafana under Configuration -> Data sources. Choose Prometheus, use http://host.docker.internal:9090/ as the URL, and leave the rest as default:

Dashboard

All that’s left is to create a dashboard or import one that we prepared for you. Simply save this snippet as .json and import it under Dashboards -> New -> Import. Click the “Upload JSON file” and choose the file. The dashboard will be called “CrateDB Cluster Monitoring”.

The dashboard displays the following metrics. The values are aggregated from all the running clusters in your organization:

  • Global stats:

    • Number of nodes
  • Clusters stats:

    • Type and number of open connections to your clusters
    • SELECT queries per second
    • INSERT queries per second
    • CPU usage (Cores)
    • Memory usage
    • File system writes
    • File system reads
  • Query stats:

    • Error rate along with the type of failed query
    • Average query duration along with the type of query
    • Queries per second along with the type of query
3 Likes