I have one million XML documents like this in my MarkLogic staging database.
<Details>
<Name>AA</Name>
<EmpId>123</EmpId>
<Account>
<AccountNo>111</AccountNo>
<IFSC>ABC</IFSC>
</Account>
<Account>
<AccountNo>222</AccountNo>
<IFSC>DEF</IFSC>
</Account>
</Details>
In this XML, an employee has multiple account numbers. From this, I want to identify any of the employees that have the same account number. Finding out unique account number from all 1M documents, and then check if the account number is matching to multiple employee ids.
How do I achieve this?
One way in which you can list all of the AccountNo
values that appear in more than one employee document would be to use the cts:value-co-occurences()
method with references to an element range index of AccountNo
and cts:uri-reference()
(which is available when the URI lexicon is enabled). Return the results as a map, with the AccountNo
as the key and the document URI(s) as the value. Then filter the items in the map and report which AccountNo
is associated with more than one document URI.
let $accountNumber-to-URI :=
cts:value-co-occurrences(
cts:element-reference(xs:QName("AccountNo")),
cts:uri-reference(),
"map")
for $accountNumber in map:keys($accountNumber-to-URI)
where tail(map:get($accountNumber-to-URI, $accountNumber))
return $accountNumber
Note that in order to be able to do this, you would need to have a range index on the AccountNo
element.