Search code examples
xquerymarklogic

How to get CustomerId from millions of records using Xquery in Marklogic


We have a large database, and I am supposed to get only CustomerId's from the entire database without compromising the performance. What is the best way to do it?


Solution

  • If you are looking to report all of the CustomerId values from a large database, then having some sort of backing index will be the fastest and most efficient. With an index, retrieving the distinct list of values from a lexicon with a functions such as:

    • cts:values() e.g. cts:values(cts:element-reference('CustomerId')) or cts:values(cts:path-reference('/Customer/CustomerId'))
    • cts:element-values() e.g. cts:element-values(xs:QName("CustomerId"))

    You could also create a TDE that has a CustomerId field and retrieve the values with an Optic query.

    If you wanted to report the values without creating an index, then you could create a CoRB batch job to select the docs that have that element, return the value(s) from the process module, and write the values to an output file apply the EXPORT-FILE-SORT=ascending|distinct option.