Strictly speaking CrateDB is not a NewSQL database, as CrateDB trades ACID for an eventual consistent approach. CrateDB is a document store with a PostgreSQL-compatible SQL Interface. Compared to Postgres however objects are first level citizens in CrateDB. This also comes with flexibility to work with dynamic schemas and mapping.
CrateDB uses Lucene storage / indexing engine, bringing fast distributed indexes as well as columnar stores (doc values) to the table, which allows you to run queries including aggregations across TiB of data and billions of rows within milliseconds.
In short: Cockroach and Yugabyte exceed in uses cases that require strong consistency or need geo-distributed data, while CrateDB mainly focus on providing a solution for large scale analytics applications.
I don’t think that we have any recent white paper for Yugabyte or Cockroach available. Doing benchmarks typically is strongly dependent on actual use case and needs deep knowledge of all the systems that you are testing. I can’t really help you with Yugabyte or Cockroach, but definitely get you started and assist with CrateDB
How CrateDB compares with respect to horizontal scalability and high availability.
→ There are production workload using up to several hundreds of CrateDB nodes. With single notes often holding multiple TiB of data. High availability is a major feature of CrateDB with highly configurable replication strategies taking into account the actual infrastructure (e.g. hardware zones), as our customers are often using it at the core of their applications (i.e. if it is not operational, the application wouldn’t). With the next release (4.7) we will also integrate Cross-Cluster-Replication.
How CrateDB compares to them in terms of administrative operations, e.g upgrades, backups, adding/removing nodes etc.
→ Again I can only speak for CrateDB here. Upgrades for minor versions are typically done in a rolling fashion, meaning with constant availability of the cluster. Adding or removing nodes to an existing cluster is typically as easy as spinning up another container/VM. Data automatically will get redistributed across the nodes. Removing can be done with a decommission statement, which automatically takes care of moving shards to the remaining nodes. Backups are realised using a repository/snapshot mechanism, allowing safe delta-updates, as only changed segments are transferred to the repository.
How CrateDB compares to them with respect to resource utilization, e.g. for X million rows the amount of per node RAM+CPU for such and such performance, etc.
→ Generally speaking is very efficient in terms of resources, especially when considering lots of indexes and runs mostly on commodity hardware. We ran test storing and querying 4TiB of indexed time-series data with 2 cores and 4gig of memory e.g. CrateDB it is possible to store large amounts of data with limited resources.
BTW while IoT and Time-series data are relevant use cases for CrateDB, we have quite a lot of users outside of this space using CrateDB to analyse network traffic, streaming analytics, retail, marketing and many more
I hope that helps you a little bit