Today we have updated our Crate cluster to version 5.0.1 from 4.8 and a new cluster check info has appear telling “The following tables need to be recreated for compatibility with future major versions of CrateDB”. The link to the documentation is System information — CrateDB: Reference where it tells I need to reindex all tables to manintain compatibility in future versions.
Following the instructions first I haver created a new temporary table using current table “show create table”
CREATE TABLE IF NOT EXISTS "demo"."ts_real_raw" ( "real_id" BIGINT NOT NULL, "date" TIMESTAMP WITHOUT TIME ZONE NOT NULL, "value" DOUBLE PRECISION, "value_text" TEXT, "modification_date" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS _cast(_cast(current_timestamp, 'timestamp without time zone'), 'timestamp without time zone'), "year" INTEGER GENERATED ALWAYS AS EXTRACT(YEAR FROM "date") NOT NULL, PRIMARY KEY ("real_id", "date", "year") ) CLUSTERED INTO 4 SHARDS WITH ( "allocation.max_retries" = 5, "blocks.metadata" = false, "blocks.read" = false, "blocks.read_only" = false, "blocks.read_only_allow_delete" = false, "blocks.write" = false, codec = 'default', column_policy = 'dynamic', "mapping.total_fields.limit" = 1000, max_ngram_diff = 1, max_shingle_diff = 3, number_of_replicas = '0-1', "routing.allocation.enable" = 'all', "routing.allocation.total_shards_per_node" = -1, "store.type" = 'fs', "translog.durability" = 'REQUEST', "translog.flush_threshold_size" = 536870912, "translog.sync_interval" = 5000, "unassigned.node_left.delayed_timeout" = 60000, "write.wait_for_active_shards" = '1' )
Then I have changed the order of the steps in the documentation since we are adding data in live, so first I have swaped the table and then try to copy the data.
alter cluster swap table demo.ts_real_raw2 to demo.ts_real_raw
insert into demo.ts_real_raw (real_id, date, value, value_text) (select real_id, date, value, value_text from demo.ts_real_raw2);
And then executed the insert query:
insert into bemp_demo.ts_real_raw (real_id, date, value, value_text) (select real_id, date, value, value_text from bemp_demo.ts_real_raw2);
I have removed autogenerated modification_date and year columns since they are autogenerated.
After executing this query around 800 registries have been copied but now the task is stuck for several hours. The cluster activity (disk, memory and cpu) is at normal working levels (no overload) so seems it is doing nothing.
Listing active jobs shows the insertion job:
select * from sys.jobs;
The original table has 83.1 Million records and it is about 4.3 GiB. There is no error in Crate logs, I have tried running the query several times and all tries appear in job list, but the record count does not increment.
¿How can I debug what is happening?
More info: I am using web interface to execute queries. KILL commands does not terminate the jobs, after restaring the cluster and trying again the same happens, only a few records are copied and the job seems stuck.