Search code examples
marklogic

Any parameter to set max forest size in MarkLogic?


Are there any parameters to control max size of the forests? or any best practice, script in operation?


Solution

  • There is not a hard limit on forest size, but there are some general guidelines. Factors such as the size and type of documents, indexes, hardware specs, and usage patterns can affect performance.

    Guidance from chapter 3 of the MarkLogic Scalability, Availability, and Failover Guide:

    Forest Sizes Per Data Node Host

    As your content grows in size, you might need to add forests to your database. There is no limit to the number of forests in a database, but there are some guidelines for individual forest sizes where, if the guidelines are greatly exceeded, then you might see performance degradation.

    The numbers in these guidelines are not exact, and they can vary considerably based on the content. Rather, they are approximate, rule-of-thumb sizes. These numbers are based on average sized fragments of 10k to 100k. If your fragments are much larger on average, or if you have a lot of large binary documents, then the forests can probably be larger before running into any performance degradation.

    The rule-of-thumb maximum size for a forest is 512GB. Each forest should ideally have two vCPUs of processing power available on its host, with 8GB memory per vCPU. For example, a host with eight vCPUs and 64GB memory can manage four 512GB forests. For bare-metal systems, a hardware thread (hyperthread), is equivalent to a vCPU. It is a good idea to run performance tests with your own workload and content. If you have many configured indexes you may need more memory. Memory requirements may also increase over time as projects evolve and forests grow with more content and more indexes.