Search code examples
xquerymarklogic

How to get total number of nodes in an XML in MarkLogic Database


I have an XML like below in my database:

<PersonalData>
  <Person>
    <Name></Name>
    <Age></Age>
    <AccountNo>
      <Number>123<Number>
      <SwiftCode>1235<SwiftCode>
    </AccountNo>
    <AccountNo>
      <Number>15523<Number>
      <SwiftCode>188235<SwiftCode>
    </AccountNo>
  </Person>
</PersonalData>

in this XML, I have multiple AccountNo nodes and I have around 1M similar records in my database. I want to identify the count of AccountNo nodes in my entire database.


Solution

  • One way in which you can report the count of AccountNo elements would be to use an XPath and count:

    count(//AccountNo)
    

    You can also use cts:search and specify the AccountNo in the $expression XPath, and then count() the results:

    count(cts:search(//AccountNo, cts:true-query()))
    

    Another way to get a count of all the distinct AccountNo elements would be to run a CoRB job to select the docs that have those elements, and then in the process module return a line for every element in the doc and write the results to a text file. Below is an example OPTIONS-FILE that could be used to achieve that:

    URIS-MODULE=INLINE-XQUERY|let $uris := cts:uris('',(),cts:element-query(xs:QName("AccountNo"), cts:true-query())) return (count($uris), $uris)
    PROCESS-MODULE=INLINE-XQUERY|declare variable $URI external; doc($URI)//AccountNo ! 1
    PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
    EXPORT-FILE-NAME=AccountNoCounts.txt
    DISK-QUEUE=true
    

    Then you could get the line count from the result file, which would tell you have many elements there are: wc -l AccountNoCounts.txt

    If you need to be able to get this count often, and need the response to be fast, you could create a TDE that projects rows for each of the AccountNo elements and then could and could select the count with SQL (e.g. SELECT count(1) FROM Person.AccountNo) or use the Optic API against that TDE and op.count().