Search code examples
marklogic

How registered queries work in MarkLogic?


In documentation it says "Registering a cts:query expression stores a pre-evaluated version of the expression"

Does this mean registering a query stores the actual result(XML documents) of the query in some cache?

If above is true

  • Will it affect the performance if the query matches large set of documents?
  • What cache is being used?
  • On updating/uploading a old/new document which satisfies the registered query will also update the cached (pre-evaluated) result?

If above is false what is the internal functionality of registered queries?


Solution

  • Registering a cts:query is useful for complex queries that you think will be used again, because it stores the intersected term list result so it doesn't have to be intersected again

    Imagine a complex cts:query with a lot of boolean constraints. When you reuse it as part of a larger query the server will have cached each individual term lists in the Term List Cache but it will still have to do all the intersection work for the larger query. By doing a cts:register() on the larger query you tell the server to store the intersected result, so it's available for as a cts:registered-query(). Saves server CPU effort.

    It was added originally for customers who wanted do define a dynamic search domain against which all searches would be executed (like what materials you've purchased from the site). When the search domain got sufficiently complex a lot of effort would go into just re-intersecting the same term lists. Having registered queries helped. User logs in, register their domain as a query, and reuse. The term list intersection work then would be just between their search and their pre-materialized domain.

    The server does a pretty good job implicitly registering queries so it's not as necessary to do this manually as it used to be. Registration is cheap. It's just a lookup table of hashes of queries and the corresponding intersected term lists result. Since the result was already calculated on first use, it's cheap to store, no performance hit, just a bit of used memory. There's a fixed number allowed and lesser used ones are purged out.

    The cache is maintained per stand so the cache doesn't get stale even as you're changing data. Magic!

    I cover this in more depth in Inside MarkLogic Server including showing a usage pattern for how to easily use the feature.