Exploring Blob storage in CrateDB

CrateDB can be used to store binary large objects (Blobs). This allows you to store binary data (e.g. product photos, PDFs, etc) in a table which will be automatically sharded and replicated across your cluster. In this tutorial, we will work with a simple chat engine. We want to store the profile pictures for the users in CrateDB.

Create blob table

First, we will create a blob table, which we use to store our photos:

CREATE BLOB TABLE profile_pictures;

Insert pictures into the blob table

Even though we are using a table you cannot use typical SQL queries (like INSERT INTO) to work with a blob table. For inserting data we need to follow the steps below.

Before we can insert a user profile picture (or any other binary file) into the profile_pictures table we need to calculate the SHA-1 checksum for our file.

For this you can either use the shasum command-line tool:

$> shasum /path/to/your/file.jpg
5883fb9412357b79540e9cb983cf8a323cd1b611  /path/to/your/file.jpg

Or alternatively, use your programming language of choice. In the following example we are showcasing how to calculate the SHA-1 checksum with Python:

$> python3 -c 'import hashlib;print(hashlib.sha1(open("/path/to/your/file.jpg","rb").read()).hexdigest())';
5883fb9412357b79540e9cb983cf8a323cd1b611

We need this SHA-1 checksum for the upload step. We can use curl to upload the file into CrateDB but any other way to submit an HTTP PUT request will do as well:

$> curl -isSX PUT '127.0.0.1:4200/_blobs/profile_pictures/5883fb9412357b79540e9cb983cf8a323cd1b611' --data-binary @/path/to/your/file.jpg
HTTP/1.1 201 Created
content-length: 0

The four variable parts of this command are:

  • the URL to your CrateDB server (“127.0.0.1:4200”)
  • the table name (“profile_pictures”)
  • the SHA-1 checksum as calculated for your file (in this example “5883fb9412357b79540e9cb983cf8a323cd1b611”)
  • the filepath (“/path/to/your/file.jpg”)

These need to be replaced to match your values.

When the upload was successful you will see the output 201 Created in your console.

Querying your profile pictures

You can use the following query to access some metadata for your blobs:

SELECT * FROM blob.profile_pictures;

This query will return the digest (this is the SHA-1 checksum we calculated before) of all your uploaded files and the last modified date. To access your file use the same URL you used above for your PUT request (“'127.0.0.1:4200/_blobs/profile_pictures/5883fb9412357b79540e9cb983cf8a323cd1b611”). For example, you could use this URL directly in an <img> HTML tag.

Updating an existing blob

It’s not possible to update an existing blob. If you want to replace a profile picture you would instead upload the new picture and delete the old entry.

To upload the new file follow the steps above with calculating the new SHA-1 checksum and uploading the file with HTTP PUT. The old file can be deleted with the following HTTP DELETE request (again replace URL, table name, and SHA-1 checksum):

curl -isS -XDELETE '127.0.0.1:4200/_blobs/profile_pictures/5883fb9412357b79540e9cb983cf8a323cd1b611'

Additional Reading

It’s worth checking out the documentation for blobs for additional information we did not cover in this tutorial.

3 Likes