User-Defined Functions in CrateDB

Watch the tutorial on our Youtube channel!

In this tutorial, I will show you how to build your own User-Defined Function step-by-step.

If you are struggling with finding the proper function for your database task, this tutorial walks you through exactly what you need to do to extend your database’s functionalities.

Let’s start with step one: learn the syntax of CREATE FUNCTION.

Step 1: Learn the syntax of CREATE FUNCTION

I use the CREATE FUNCTION statement to add JavaScript functions to my database. Let’s check the general structure for this statement:

CREATE [OR REPLACE] FUNCTION function_name
    ( [ [argument_name] argument_type ] [, ...] ] )
RETURNS return_type
LANGUAGE language_name
AS 'definition'
  • CREATE FUNCTION creates a new UDF. Another option is to use CREATE OR REPLACE FUNCTION, which creates or replaces an existing function.
  • function_name is the name of the function
  • argument_name is the optional name for the argument, for documentation purposes only, and argument_type is the argument’s data type, mandatory.
  • return_type is the returned data type of the function and can be any supported type.
  • language_name is the registered language used for the function block, for example, JavaScript.
  • definition is a string defining the function’s body in the chosen language.

Step 2: Write the JS code block

Now that you’ve learned the basic structure of CREATE FUNCTION, let’s take a closer look at the function definition.

I write functions in JavaScript following the AS keyword and enclose the code in single quotes.

The arguments for the JavaScript function are taken directly from the CREATE FUNCTION statement and in the same order.

The return and argument can generally be of any supported data type. However, I must ensure the JavaScript function returns a type that can be casted to the same format as the return_type from CrateDB.

For example, let’s see how a UDF to sort an integer array ascending would look like:

CREATE OR REPLACE FUNCTION array_sort_asc (ARRAY(INTEGER))
RETURNS ARRAY(INTEGER)
LANGUAGE JAVASCRIPT
AS '
function array_sort_asc(array_integer) {
    return Array.prototype.sort.call(array_integer, (a, b) => a - b);
}
';

In this example, the Javascript function returns an array of integers, successfully casted to ARRAY(INTEGER) in CrateDB. Here, I use a built-in JS method to return a sorted array, which makes the task a piece of cake.

Step 3: Build your own UDF

Once you’ve learned how to write JS code in the CREATE FUNCTION definition, it’s time to put your own UDF together! I will now show you how to build two UDFs step by step.

For the first one, let’s say you need to filter all elements within a given range in an integer array. How would you solve this task?

First, I look for a corresponding built-in method in JS. I find the Array.prototype.filter() method, which returns an array of elements that passed some given conditions. I use this method to create the function array_filter(array_integer, min_value, max_value), which returns an array of elements from array_integer that lie within the min_value and max_value range. My JavaScript code ends up looking like this:

function array_filter(array_integer, min_value, max_value) {
    return Array.prototype.filter.call(array_integer,
        element =>
        element >= min_value && element <= max_value);
}

All left is to place this JS code into the CREATE FUNCTION statement, which must have the same arguments and return data types as the JS code. My finished UDF looks like this:

CREATE OR REPLACE FUNCTION array_filter(ARRAY(INTEGER), INTEGER, INTEGER) 
RETURNS ARRAY(INTEGER)
LANGUAGE JAVASCRIPT
AS '
function array_filter(array_integer, min_value, max_value) {
    return Array.prototype.filter.call(array_integer,
        element =>
        element >= min_value && element <= max_value);
}
';

I run this command in the Admin UI to create the function, and I can call it from now on with a SELECT statement, like so:

SELECT array_filter([5, 7, 20], 2, 8) 
-- Result: [5, 7]

And there you have it! You can call array_filter to quickly get this job done whenever you need it.

Let’s see one final example for a UDF?

You have two two-dimensional vectors and want to calculate the distance between them. How would you build a UDF to execute this task?

First, I write a JavaScript function to calculate the distance between two vectors. I can do it easily using the Pythagorean theorem.

function vector_distance(x1, y1, x2, y2) {
    return Math.sqrt(
        Math.pow((x2 - x1), 2) + Math.pow((y2 - y1), 2)
    );

Here I notice the return value is not an integer but a double precision value because of the square root operation. So in my CREATE FUNCTION statement, I set DOUBLE as the return type, and the four arguments are of type INTEGER. Now I have everything to put the UDF together, and it ends up looking like this:

CREATE OR REPLACE FUNCTION vector_distance (INTEGER, INTEGER, INTEGER, INTEGER)
RETURNS DOUBLE
LANGUAGE JAVASCRIPT
AS 'function vector_distance(x1, y1, x2, y2) {
  return Math.sqrt(
      Math.pow((x2 - x1), 2) + Math.pow((y2 - y1), 2)
  );
}';

I call vector_distance with a select statement, like so:

SELECT vector_distance(1, 3, 4, 5);
-- Result: 3.6055

And there, we have our second and final example of how to build a UDF from start to finish. You can extend your database functionalities to your needs in just a few steps!

Reference

3 Likes