Cluster checks failing: Recreate tables

I recently upgraded our Crate cluster from v2.3.11 to v3.3.5. As expected, many (but not all) of my tables are failing the cluster checks “The following tables need to be recreated for compatibility with future major versions of CrateDB:”.
In preparation for the next upgrade (v4.5+), I need to go ahead and recreate these tables. I am using this command “OPTIMIZE TABLE xxx WITH (upgrade_segments=true);” as recommended in the documentation (Optimization — CrateDB: Reference).
However, after running this against any particular table in my schema, the table still appears in the list of tables needing upgrade. When I check the tables schema (information_schema.tables) the version[‘created’] for that table is still “cratedb : 2.3.11”.
What have I missed here? I have almost a terabyte of data to upgrade, and copying it to new tables would cause some extreme downtime (unacceptable!). Upgrade_segments looked like a dream come true, but it doesn’t do what it is specifically designed to do, namely “Don’t optimize but instead upgrade all segments of the table/partition to the current version of storage engine.”.
I would appreciate any help and advice you can give!
Thanks,
Dennis

1 Like

Hi @denniswi,

unfortunately upgrading segements does not help in your case

Segments are only a subset of the CrateDB/lucene data store. Upgrading the segments does not adress other major index structure changes. For applying these tables have to (very occasionaly) be re-created.

Given you have enough storage available in your CrateDB cluster you should be able to re-create the table without downtime though. You will need to accomodate for newly modified data (new inserts, updates, deleted) during re-creation until switch over to the new table can be done.

If you don’t enough storage to duplicate your table you can potentially do a segmented migration. Re-create one partition after another and after every migration step drop the partition in the old table.

Please reach out if you have further questions
Best regards,
Johannes

1 Like

@jayeff, Thank you for your quick response. May I suggest that the documentation I linked to be clarified with your explanation?

Also,

Can you give a little more detail here? I have about 200 tables. Some are small and pretty static, I’m not worried about them. Others have Billions of rows, organized in weekly partitions, with tens of thousands of records inserted per minute.
It is these large tables that concern me - by the time the latest partition is copied, I may have missed hundreds of thousands of inserts. What do you suggest for this scenario?

Thanks you again for your help!

Dennis

1 Like

Hi @denniswi,

every migration story can be a bit different but I hope I can give you some pointers

Can I assume that the newly inserted records are all written in latest partition? In this case you can safely migrate all old partitions to a new table. This process can run in parallel while new records are being written to the old table.

Once all old records are backfilled you are left with only most recent partition to migrate. Here you can opt for a short downtime to migrate the last partition and switch over tables. If you want to further reduce the required downtime do this on a Monday morning, when you have a fresh partition with just a small number of records.

You also have a option for a zero-downtime table swap, but this will require some additional planning and work. For example you could deploy a code change which writes new records to both the new and old table. In parallel backfill the new table from the old table. Once you reach record paritiy you would have the chance to run some checks (completness, performance, etc) on the new table. If everything looks good do a switch over to the new table.

Hope this helps

Are there plans to improve this shortcoming in future upgrades? This is a huge painpoint, a very difficult workflow when you have a lot of data.

Hi @denniswi,

I understand your point.

There are no near-future plans to have something built in to solve this though as there is no one-size-fits-all approach to easily solve this. As mentioned this is an operation which is only necessary very occuasionally. We are aware of the implicit overhead of a structure change and only require in when there is no way around it

We are still stuck on Crate v3.3.5

Our final solution was to add functionality to our application.
For each table that needs to be recreated:

  1. Clone the table structure to a *_TMP table
  2. Intercept all DDL and DML operations to both tables
  3. Copy the records from the original table into the _TMP table by partition
  4. Verify the data was accurately copied
  5. Swap the _TMP table and the original table

It is working pretty well, but in order to perform step 5, we have to set number_of_replicas=‘0’. This works on 90% of our tables,
Some tables, however, fail with this error:

SQL state [XX000]; error code [0]; ERROR: MapperParsingException:
Failed to parse mapping [default]: Could not convert [date.index] to boolean;
nested exception is io.crate.shade.org.postgresql.util.PSQLException:
ERROR: MapperParsingException: Failed to parse mapping [default]:
Could not convert [date.index] to boolean

I think it may be related to having differing “min_lucene_version” of the shards of the table. Some are 6.6.1, and others are 7.5.0

Has anyone else reported this error? Any ideas for correcting it?

Thanks!
Dennis

Hi Dennis,
My initial thinking here would be that this is a problem with the old table.
Do you think you may need to keep it?
Otherwise maybe you could take a snapshot of it, drop it, recreate it empty, and then use the drop_source option of SWAP TABLE?

Hi Dennis,
Sorry is me again, just rereading the history of the thread.
Depending on how you implemented the “intercept” of DDL and DML commands.
Considering there are a lot of versions to go between 3.3.5 and 5.3.0,
maybe it would worth considering doing this migration straight into a new 5.3.0 cluster?
This would not necessarily mean temporary infrastructure, depending on the situation the new cluster could be deployed in parallel and resources could be repurposed a bit at a time as the migration progresses.

Thank you @hernanc for your response.
This is an online production database with a hundred thousand active users and billions of records, so any downtime will be a serious issue. I don’t think this suggestion will be acceptable.
Also, our corporate policy is to only upgrade production databases to major version -1 (in this case we will only upgrade to Crate 4.8).

How would a migration to 5.3.0 even work? My tables were upgraded to 2.3 some time ago, how would I upgrade them to 5.x or even 4.x directly?

Dennis

Hi,
To perform the version jump, what I was alluding to is the scenario where the copy step in your 5 points plan becomes something like COPY TO - COPY FROM or cr8 insert-from-sql , and then the swap step is about repointing the clients to a connection to the new cluster.