Questions around optimize and table segments

Hi Crate team,
I read about optimize command to align all table segments on the disk so that read performance can be improved and few questions came up.

  1. Is there any way I can see how the segments are spread for my table or for a table partition?
  2. I have done only insert operation using JDBC in a table partition and did not do any delete or update operations. Is there is chance of spread out segments in this case? Will firing optimize command further improve read performance?
  3. Documentation says that CrateDB does the optimization in the background. Is there any way I can check if it is complete or pending?
1 Like

Hi @vinayak.shukre,

  1. Yes, the system table sys.segments exposes information about the segments. This includes the table that the segment belongs to, the segment size/number of documents, and more.

  2. Also when running only INSERT statements, optimizations can happen. Segments are written on two conditions, as per the Lucene documentation:

    A flush is triggered when there are enough added documents since the last flush. Flushing is triggered either by RAM usage of the documents (see IndexWriterConfig.setRAMBufferSizeMB(double)) or the number of added documents (see IndexWriterConfig.setMaxBufferedDocs(int)).

    Independent of that, the optimization algorithm can still prefer larger segments and therefore merge smaller segments that have been created based on one of those two conditions.
    You can see this yourself when running INSERT statements on a table and watching sys. segments. Initially, new segments are generated rather quickly, but at some point, the number of segments decreases, as optimizations kick in and start consolidating segments.

  3. I don’t think there is currently any information exposed on ongoing/past optimizations. Optimizations typically don’t require much attention as they are more of a housekeeping task to keep overhead caused by segments low.

Is there a particular practical reason why you want to take a closer look at segments, or is it just out of curiosity?

2 Likes

Hi @hammerhead ,
As it said it improves the read/query performance if there are less number of big size segments, I was exploring it.