Querying CrateDB with DataStation

DataStation is a powerful IDE for querying and analyzing data. It supports data imports from databases, REST APIs, and files. In this tutorial, we will illustrate how to run DataStation with your local CrateDB instance.

Install DataStation

To install the DataStation Desktop download the pre-built binary for your operating system. After successful installation you should get to the main page:

Now add a new page called crate-test that we will use for querying and visualizing data from CrateDB.

Import sample data to CrateDB

One of the easiest ways to get started with CrateDB is to run it on Docker:

docker run --publish=4200:4200 --publish=5432:5432 crate

Now, let’s import New York City taxi dataset to your CrateDB instance. First, create a nyc_taxi table:

CREATE TABLE "nyc_taxi" (
"congestion_surcharge" REAL, 
"dolocationid" INTEGER, 
"extra" REAL, 
"fare_amount" REAL, 
"improvement_surcharge" REAL, 
"mta_tax" REAL, 
"passenger_count" INTEGER, 
"payment_type" INTEGER, 
"pickup_datetime" TIMESTAMP WITH TIME ZONE, 
"pulocationid" INTEGER, 
"ratecodeid" INTEGER, 
"store_and_fwd_flag" TEXT, 
"tip_amount" REAL, 
"tolls_amount" REAL, 
"total_amount" REAL, 
"trip_distance" REAL, 
"vendorid" INTEGER) 

Then, import data with COPY FROM statement:

COPY "nyc_taxi" FROM 'https://s3.amazonaws.com/crate.sampledata/nyc.yellowcab/yc.2019.07.gz' WITH (compression = 'gzip');

It will take a few minutes until all data are loaded and available in your CrateDB instance.

Data source setup

Inside DataStation create a new data source in the left sidebar. Choose CrateDB as a source database and add connection details:

If you are connecting to localhost you can leave the Host field blank. However, when you are connecting to a PostgreSQL wire-compatible database in DataStation (including CrateDB) without TLS, you have to opt out of TLS by specifying sslmode=disable in the host string. For the database name and username specify crate and leave the password blank.

Panel setup

Before querying and visualizing data create a new panel by hitting the Add Panel button:

Now, let’s calculate the average trip distance for New York taxi data. In the panel choose CrateDB source, in the query field enter SELECT AVG(trip_distance) FROM "doc"."nyc_taxi" and hit the Run button:

As you can see the result of the query appears in the Preview field.

Display data

To display data in a tabular format, let’s first run a more interesting query. In the following example, we want to find the number of taxi trips per hour. This can be done by running the query:

SELECT date_trunc('hour', pickup_datetime) AS hour, count(*) AS trips
FROM "doc"."nyc_taxi"
GROUP BY 1
ORDER BY 1;

The output in JSON format can be seen in the Preview field:

To show results in a tabular format, next to the Add Panel button hit the table icon to generate a table based on the results:

From this point, you are ready to import other datasets and visualize query results with tables and charts. If you like this tutorial and would like to learn more about CrateDB in general visit our docs and join our community.

3 Likes