Search code examples
marklogicresource-cleanup

Marklogic - When Deleted Fragments will be cleaned up?


MarkLogic version 9.0-7.2

We have 6M records in MarkLogic with ingestion & indexing happening regularly based on business requirements and document availability from source.

We observed that usage disk size for all 3 nodes where different and difference was large enough (around 30 GB) for us to start investigation. Also there were lower disk warnings and error where there on Monitor -> Disk usage dashboard.

After investigation, we found that it was because on some of nodes having lesser number of deleted fragments compared to others and assuming that this is the main cause of usage disk size difference.

So 2 questions

  • How to clean up deleted fragments in all forest including replicas?
    • Is there any trigger we can do to clean up?
  • Why master forests were having large numbers of deleted fragments compared to replicas?

enter image description here


Solution

  • Deleted fragments are cleaned up as part of the Merging process, which is the dynamic tuning process MarkLogic uses to optimize performance.

    The Merge Priority setting for the database will determine the CPU scheduling priority for merges. If it is set to lower, then the server will use a lower priority scheduler to determine when merges will run. This means if you server is seeing a constant level of activity, it may impact the amount of merging the system is able to do. Increasing the Merge Priority level will allow the system to do more merging, which will clear more deleted fragments.

    Primary forests will typically see higher activity, since they are seeing query traffic along with updates and deletes. The Journal frames are then replicated to the Replica forests. My understanding of this is that the workload against the Replica is probably lower, so it can get more low priority CPU cycles for merging.

    Check out the Understanding and Controlling Database Merges section of the documentation for more details.

    Merges can also be triggered manually at either the forest level or the database level. Manually Initiating a Merge. Use caution initiating a full database level merge, as it can be very resource intensive and will negatively impact query/ingest performance while the merges are running.

    You can also tune the merge settings should you determine that the defaults are not meeting your requirements. Configuring Merge Policy Rules