Blob support removal for next version

Just a simple question : Are you sure you want to completely remove blob support from CrateDB instead of improve it or keep it in place “as is” ?

I saw on github that blob support will be removed due to these reasons :

  • CrateDB doesnt have a built in backup solution for BLOB data <= Each FS on a production environment must have a backup system. For CrateDB nodes or anything else.
  • Apparently it is not used a lot (by our none customers & users) <= Do you know all of your users ?
  • It needs to be maintained and code has to be adopted <= …as every feature :slight_smile:
  • PostgreSQL clients have no interface to utilize BLOBs <= i don’t use PostgreSQL Clients, i’m using CrateDBAdmin.net Client, dedicated to CrateDB, which support blobs with previews. But a DB feature removal decision should not rely on limited-featured third-party clients (clients which are not mandatory to use CrateDB)

I 'm using millions of blobs entries on production environment with great ease and secured replication/distribution over multiple nodes and wanted to add more applications on CrateDB … but now… i have doubts. Deprecate and remove a full working feature like this can cause unpredictable behaviour on customer side.

no reaction at all ? :slight_smile:

Hi Max,

I have seen some internal discussion on this topic and I think one of our lead engineers wanted to come back to you on this topic.

Afaik, there aren’t really that many customers/users of CrateDB actually using the blob feature right now. Also we openly communicated it (and pinned the issue in the repo) to get community feedback :slight_smile:

Would you be willing to share what your are using the blob feature for right now?

br
Georg

1 Like

Hi Georg,

I know (now) that you have openly communicated the deprecation and one of my tech-lead advertised me about that (but i didn’t take it seriously, i must admit, due to the strangeness of this announcement) and i was very busy with Covid-19 impacts for all of my customers applications and websites (I’m an IT manager for a big well-known retailer in France and other european countries). And i recently came back to release my CrateDB Client (on my own time) and saw this announcement from my own eyes. So … tadaaa ! i’m here now (but too late it seems…).

We are using Blobs to store orders forms, delivery notes and messages attachment files and other things. Our customer database contains nearly 30 millions of customers so it represents a lot of data and files. We also use CrateDB to keep track of SMS communications (thousands and thousands a day), order status changes etc… and it fully hosts datas for some of our backoffices.

CrateDB Nodes allows us to replicate and secure files automatically and retrieve them without any extra load balancer installation and it has always worked like a charm for us. I also used for other applications some products like GlusterFS but the all-in-one solution provided by CrateDB reduce the installations costs and facilitate blob storage synchronization (versus other network FS distributed/agregated systems). Files are rarely present on a storage without being linked by a database entry somewhere ! So, the association Data + Blobs at the same place is fully consistent / coherent. And the CrateDB blob distribution/replication solution is the same for Windows servers and Linux Servers (we can’t tell this for other Network FS distributed/replicated systems). We personally are mainly hosted by Linux servers but we can’t ignore Windows Servers :wink:

Our CrateDB nodes data filesystems are of course backed up everyday for security (it’s mandatory in our datacenters), but CrateDB infrastructure already secures the whole thing well on its own.

And i must admit that making a simple SQL request joining a blob table which allows to preview the file directly (mouse over digest) associated with a communication, an order or anything else for analyze & debugging session should be much better than being forced to go and retrieve a file aside via SFTP or share to access the NAS…). It was the next feature planned for my CrateDB Client (digests preview in joined blob table results…) …

So … voilà !

3 Likes

Perhaps instead of abandoning BLOB support it should be extended with pg’s lo_* functions, so that any pg clients with large object support (including psql) can use them

3 Likes

@MikeMax

FYI:

After some more (internal) discussion, we’ve decided to NOT remove the BLOB support by now.
We agree about the use-case of storing large binary objects and planning to come up with a SQL based implementation (e.g. BLOB data type) in future which should then also support backup & restore of these binary objects (no ETA at all).
Until we or a contributor can provide such an alternative solution for storing BLOBs, we do not plan to change or remove the current implementation.
But please be aware that until such a solution is implemented we probably won’t add the missing backup & restore support to the current implementation.

also see:

2 Likes

Thanks for the community consideration ! It’s very appreciated !

Long Life to CrateDB ! And keep up this excellent work !!