Questions around translog

Hello Crate team,
Here are few questions about translog files generated during the insert operations. There is no update or delete operation.

  1. Does the diskspace per partition shown in the Crate UI include diskspace taken by all shards ( Primary + non-primary) and diskspace taken by corresponding translog files?
  2. I have seen that translog files do not get cleaned up when I have replication on even after ingestion of records is stopped. When I have 1TB data per partition, it shows me 3TB diskspace consumed and on file system I can see that translog directories have taken almost 1 TB. Even after ingestion for this partition is over and more than 48 hours have passed after the last record was ingested in this partition, its translog file is still not truncated.
  3. When I have translog_stats[‘uncommitted_operations’] > 0 for a shard shown in the sys.shards table, does that mean those many operations are not written to lucene yet and is it possible to not have correct result for my query? Or is it the case that when query is fired, translog files would be consulted for calculating the result which can degrade the query performance ?
  4. If I have translog.durability = request then there will not be any translog file. Translog files are generated only when this value is set to async. Is that correct understanding?
  5. Is there any command that I can fire to clean up the translog files? Or is there any property that will not let my translog files increase beyond a limit.

Thanks in advance.

Only primary, but with translog

This potentially sound like a bug. By replication are you referring to shard replicas or logical replication?
can you run a refresh and optimize. could you describe the setup in more detail

what does

select
table_name,
partition_ident,
sum(seq_no_stats['max_seq_no']) max_seq_no,
sum(seq_no_stats['global_checkpoint']) global_checkpoint,
count(*)
from sys.shards
where translog_stats['uncommitted_operations'] > 0
GROUP BY table_name, partition_ident
HAVING sum(seq_no_stats['max_seq_no']) > sum(seq_no_stats['global_checkpoint'])
order by 1,2
limit 100;

return?

If I have translog.durability = request then there will not be any translog file. Translog files are generated only when this value is set to async. Is that correct understanding?

no, in both cases translogs are written. The difference is, when they are written

Is there any command that I can fire to clean up the translog files? Or is there any property that will not let my translog files increase beyond a limit.

No, not directly. If translogs are kept there is a reason. a REFRESH or OPTIMIZE can potentially release them.

Shard replication. I am using Crate 4.6.7 for my evaluation.

image

4.6.8 / 4.7 included a fix for an issue regarding the checkpoint sync, which should significantly improve the situation. One way to force the sync is to temporarily deactivate the replicas and then activate them again. However I would not recommend doing this on any production system without a support engineer overviewing.

Just FYI the current release is 4.8.1 and 5.0.0 will be released within the coming weeks.

Thanks @proddata for confirming the bug. When cluster was restarted, it dropped the translog.

1 Like