Introduction to using Azure Databricks with CrateDB

This is a quick intro into getting started with Azure Databricks and CrateDB.

Setup Azure Databricks

  1. Add a new Databricks service to your Azure Subscription
  2. Once this is done use “Launch Workspace”
  3. After you are signed into Azure Databricks use the common task “New Cluster” to start a cluster for your Spark jobs execution
  4. Install the pgjdbc library (as of time of publishing org.postgresql:postgresql:42.2.23) from Maven for your cluster

azure-databricks-server-install-library

Connect to CrateDB: Scala example

  1. Create a new notebook with default language Scala
  2. Add the following code and run the notebook
val crateUsername = "<username>"
val cratePassword = "<password>"
val postgresqlUrl = "jdbc:postgresql://<url-to-server>:5432/?sslmode=require";
val tableName = "<tablename>"

val jdbcDF = spark.read
       .format("jdbc")
       .option("url", postgresqlUrl)
       .option("driver", "org.postgresql.Driver")
       .option("dbtable", tableName)
       .option("user", crateUsername)
       .option("password", cratePassword)
       .option("fetchsize", 100000)
       .load()
jdbcDF.head(n=10);
  1. You should see the results from CrateDB

Connect to CrateDB: Python example

  1. Create a new notebook of default language Python
  2. Add the following code and run the notebook
crateUsername = "<username>"
cratePassword = "<password>"
postgresqlUrl = "jdbc:postgresql://<url-to-server>:5432/?sslmode=require";
tableName = "<tablename>"

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", postgresqlUrl) \
    .option("driver", "org.postgresql.Driver") \
    .option("dbtable", tableName) \
    .option("user", crateUsername) \
    .option("password", cratePassword) \
    .load()
jdbcDF.head(n=10)
  1. You should see the results from CrateDB