Objects in CrateDB

In this tutorial, you will learn the basics of CrateDB Objects. This tutorial is also available in the video format: CrateDB Video | Fundamentals: Getting Started with CrateDB Objects

First, I present a simple use case to demonstrate how CrateDB Objects can add clarity to data models.

Then, I give an overview of the Column Policies for CrateDB Objects: dynamic, strict, and ignored. I also provide examples of how these policies affect INSERT statements.

Finally, I show you how to INSERT, UPDATE and DELETE records with the Object datatype.

Setup and Intro to Object Type

Something unique about CrateDB is that it has OBJECT among the data types. Objects grant considerable versatility to data models and can be manipulated using SQL, thanks to CrateDB’s familiar PostgreSQL interface.

But when do I use OBJECT? Let’s see an example where OBJECT fits like a glove.

Let’s imagine a system with a shipping address and an invoice address. Each address has attributes like name, street, city, etc. Without Objects, I would probably set it up like this:

shipping_name, shipping_street, shipping_city, and then also invoice_name, invoice_street, invoice_city

This looks a bit repetitive and disorganized.

Imagine you could group the attributes into an invoice and a shipping Object. The system would then look like this:

invoice: {name: ..., street: ..., city: ...} and shipping: {name: ..., street: ..., city: ...}

looking much more neat and clear.

This is one example of how Objects can add clarity to data models. Throughout this tutorial, I’ll work extensively with Objects, so stay tuned for more examples!

But before anything else, let’s start CrateDB. In our previous tutorial, I showed you how to get started with CrateDB for the first time using Docker. Today I’m following that same method, so I open my terminal and run the following command:

docker run --publish=4200:4200 --publish=5433:5432 --env CRATE_HEAP_SIZE=1g crate:latest

If you wish to learn more about getting started with CrateDB, check out our Getting Started with CrateDB video and the Getting Started documentation at crate.io.

Now that CrateDB is running, I navigate to localhost:4200 and land on the CrateDB Admin UI.

So while I set up a table for this tutorial, I’ll give you an overview of the OBJECT syntax for creating tables and executing queries.

First, I create a quotes table, where I’ll store some literary quotes and information about the title and protagonist.

CREATE TABLE quotes (
  title TEXT, 
  quotation OBJECT,
  protagonist OBJECT(STRICT) AS (
    surname TEXT, 
    first_name TEXT, 
    details OBJECT AS (
      age integer, 
      birthday TIMESTAMP
    )
  )
);

Let’s break up this statement:

  • I set the title as a usual TEXT type
  • I put the quotation as an OBJECT - by default, set to DYNAMIC
  • Finally, I store the protagonist as an OBJECT, but with the STRICT column policy and some pre-defined key attributes - surname, first_name, details.

So, what is the difference between the quotation and protagonist objects? Let’s look at how objects and their column policies work in CrateDB.

Objects and the Column Policy

CrateDB structures OBJECT as a collection of key-value pairs, where an OBJECT can contain any other type, including child OBJECTs. Moreover, an OBJECT column can be schemaless or have a fixed sub-column structure.

In CrateDB, there are three possible policies: DYNAMIC, STRICT, or IGNORED, and I’ll tackle them in the coming examples in the tutorial. But before that, what are the main differences between them?

  • The default DYNAMIC policy dynamically allows inserts to add new subcolumns to the object definition. It grants the highest flexibility to the OBJECT column, and new columns are usable like any other subcolumn. You can retrieve them, sort by them, and use them in where clauses.
  • The STRICT policy works together with a column definition. CrateDB will only accept inserts that strictly obey that definition.
  • Finally, the IGNORED policy will only index the sub-column values that match the definition and ignore those that don’t. I use this policy when there is a sub-column definition I want to keep, but inserts may have different sub-columns as previously defined.

So now that I have my quotes table ready and learned the basics of CrateDB OBJECT, I can move on to the first INSERT in the table.

INSERT

With the INSERT statement, I can add data to my CrateDB tables. I can insert a single row of values, bulk-insert several rows at once, and even insert queried values directly.

First, I’ll populate my quotes table for the first time with a single insert.

INSERT INTO quotes (
    title,
    quotation,
    protagonist
) VALUES (
    'Alice in Wonderland',
    {
        "words" = 'Curiouser and curiouser!',
        "length" = 3
    },
    {
        "surname" = 'Pleasance Liddell',
        "first_name" = 'Alice',
        "details" = {
            "age" = 7,
            "birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
        }
    }
);

Here, I insert the quotation and protagonist values as JSON strings, but it’s worth saying they are NOT JSON objects.

The difference between inserting values for the quotation object and the strict protagonist object is evident here.

  • In the quotation column, I insert an object with two sub-columns, words and length, which were never defined before. CrateDB accepts this insert as quotation has a DYNAMIC column policy and dynamically adds these subcolumns to the OBJECT definition.
  • CrateDB accepts this insert for the protagonist column because it strictly follows the OBJECT definition from the CREATE TABLE statement.

Let’s see a different example for INSERT that CrateDB accepts:

INSERT INTO quotes (
    title,
    quotation,
    protagonist
) VALUES (
    'Alice in Wonderland',
    {
        "words" = 'Curiouser and curiouser!',
        "meaning" = 'Increasingly strange'
    },
    {
        "first_name" = 'Alice',
        "details" = {
            "birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
        }
    }
);

This INSERT is successful in CrateDB, although it has some differences from my previous statement.

  • I have added a meaning sub-column to the quotation, which works because the quotation object has a DYNAMICpolicy.
  • Also, I have not given the protagonist object a surname or an age, although these attributes were given on the original STRICT object definition. This works because, in this case, CrateDB will provide a NULL value for these sub-columns.

Finally, let’s do an INSERT that CrateDB will decline:

INSERT INTO quotes (
    title,
    quotation,
    protagonist
) VALUES (
    'Alice in Wonderland',
    {
        "words" = 'Curiouser and curiouser!',
        "meaning" = 'Increasingly strange'
    },
    {
        "age" = '7',
        "first_name" = 'Alice',
        "details" = {
            "birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
        }
    }
);

In this example, I attempted to insert the age attribute outside the details sub-column, which is different from my STRICT object definition. CrateDB returns the following error:

Error!

StrictDynamicMappingException[mapping set to strict, dynamic introduction of [age] 
within [protagonist] is not allowed]

I will now query my quotes table to check what it looks like so far:

SELECT
    title AS title,
    protagonist['first_name'] AS name,
    date_format(
        '%D %b %Y',
        'GMT',
        protagonist['details']['birthday']
     ) AS born,
    quotation['words'] AS quote
FROM quotes limit 100;

And I will also create a VIEW for this same query, as I’ll repeatedly use it during this tutorial. This will be very handy for checking the results from the following inserts.

create view general_information as 
SELECT
    title AS title,
    protagonist['first_name'] AS name,
    date_format(
        '%D %b %Y',
        'GMT',
        protagonist['details']['birthday']
     ) AS born,
    quotation['words'] AS quote
FROM quotes limit 100;

Now that the VIEW is ready let’s check some other INSERT options.

For instance, I can insert several values simultaneously with a bulk INSERT. Each row of values is inside brackets, and commas separate the different rows.

INSERT INTO quotes (
    title,
    quotation,
    protagonist
) VALUES 
    (
    'Slaughterhouse-Five',
    {
        "words" = 'Everything was beautiful, and nothing hurt.'
    },
    {
        "surname" = 'Pilgrim',
        "first_name" = 'Billy',
        "details" = {
            "birthday" = '1922-07-04T00:00Z'
        }
    }
    ), 
    (
    'The Complete Tales of Winnie-the-Pooh',
    {
        "words" = 'How lucky I am to have something that makes saying goodbye so hard.'
    },
    {
        "first_name" = 'Winnie-the-Pooh',
        "details" = {
            "birthday" = '1926-10-14T00:00Z'
        }
    }
    ), 
    ( 
    'The Complete Tales of Winnie-the-Pooh',
    {
        "words" = 'I am short, fat, and proud of that.'
    },
    {
        "first_name" = 'Winnie-the-Pooh',
        "details" = {
            "birthday" = '1926-10-14T00:00Z'
        }
    }
);

Let’s now check the state of the quotes table by querying the general_information view.

SELECT title, name, born, quote
FROM "doc"."general_information"
LIMIT 100;

I can also insert data directly from a query. For instance, I want to create a table to store my favorite quotes.

CREATE TABLE favorite_quotes (
  title TEXT,
  quotation TEXT, 
  protagonist TEXT 
);

And as a big Winnie-the-Pooh fan, I want to add all its quotes to my favorite_quotes table. To do that, I can query the quotes table, filter for Winnie-the-Pooh results, and then insert the result directly into the favorite_quotes table.

INSERT INTO favorite_quotes (title, quotation, protagonist)
  (SELECT title,
          quotation['words'] AS quotation,
          protagonist['first_name'] AS protagonist
   FROM quotes
   WHERE protagonist['first_name'] = 'Winnie-the-Pooh' );

UPDATE

While doing my research for this tutorial, I stumbled across two possible birth dates for “Winnie-the-Pooh.” October 14th, 1926, when the book “Winnie-the-Pooh” was first published, and on August 21st, 1921, Pooh was gifted to Christopher Robin on his first birthday.

I used the former date as the birthday, but now I want to update all Winnie-the-Pooh records to the other date. I can do that with the UPDATE statement. Here, I specify the table and attribute to be updated and apply it to all records where the first name is “Winnie-the-Pooh.”

UPDATE quotes
SET protagonist['details']['birthday'] = '1921-08-21T00:00Z'
WHERE protagonist['first_name'] = 'Winnie-the-Pooh';

DELETE

Deleting rows in CrateDB is done with the DELETE statement, where we specify the table and conditions to delete. For example, if I want to delete all records with the “Alice in Wonderland” title, I run the following statement:

DELETE
FROM quotes
WHERE title = 'Alice in Wonderland';
1 Like