_version does not uniquely identify a particular version of a row

peterCR8 · October 16, 2018, 4:41am

Hi there!

I’ve been looking in to crate.io recently, and I’ve found something that feels a little bit surprising. If you perform a number of concurrent blind updates to a 0.54.9-1~jessie cluster which experiences a network partition, it’s possible for two reads of a single row to have the same _version but different values.

For instance, here are two primary-key reads ( select (value, _version) from registers where id = ? ) which returned different values for the same version of the row:

Or repeated reads which do not agree:

Crate’s optimistic concurrency control docs say things like “This [version] is increased by 1 on every update” and “Querying for the correct _version ensures that no concurrent update has taken place.” This suggests to me that _version should uniquely identify a version of a row. This undermines the safety of Crate’s concurrency model: even if conditional updates based on _version are safe, if clients can’t agree on what value a particular _version identifies, there’s no way to avoid concurrency anomalies. I imagine lost updates might be possible. This might also affect the safety of SQL update statements which do not rely explicitly on _version: for instance, UPDATE foo SET visits = visits + 1 , but I haven’t tested those yet.

You can reproduce this behavior by cloning Jepsen at b25e636f and running lein test in crate/ , with the standard five-node setup; see Jepsen’s docs for details. Or, you should be able to reproduce them yourself, by having five clients, each bound to one host of a five-node cluster, perform concurrent writes to a single row, and having five more clients perform concurrent reads, recording their _version s. A ~200 second network partition isolating each node from two of its neighbors, forming an overlapping ring topology, appears to be sufficient to induce this behavior–but that’s literally the first failure mode I tried, so there may be simpler ones.

As advised, I’m using an explicit expected-nodes count, majority values for all minimum-master-limits in the config file, and I’ve lowered some timeouts to speed up the testing process. The table is a simple (pkey, value) table, replicated to all nodes.

I suspect this issue stems from, (and also affects) whatever underlying ElasticSearch version you’re using, but it’s possible those problems have been resolved in 5.0.0. As a courtesy to your customers, may I recommend you adopt their resiliency status as a part of your documentation, so users know what behaviors they can expect?

Topic		Replies	Views
Update by primary key should return either the updated document or the "_version" value CrateDB	1	541	September 13, 2023
Cannot delete row just created CrateDB	2	629	May 15, 2021
Auto increment primary key to identify rows insertion order CrateDB	1	337	March 20, 2023
Table reindexing after version upgrade CrateDB	7	542	October 4, 2022
HTTP connection - caching, or flush data? Inconsistencies CrateDB	1	48	March 30, 2024

_version does not uniquely identify a particular version of a row

Related Topics