Search code examples
xquerymarklogic

Xquery on MarkLogic using OR


This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:

<Person id="1">
  <Name>Bob</Name>
  <City>Oakland</City>
  <Phone>2122931022</Phone>
  <Phone>3123032902</Phone>
</Person>

Note that a document can and will have multiple Phone elements.

I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.

I have tried this:

let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383") 
let $c := cts:word-query("3933849383") 
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)

which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:

<results>
  <match id="18" phone="2123339494" name="bob" city="oakland"/>
  <match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>

So I think I need some form of cts:search that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!


Solution

  • Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.

    First of all, you are better off using cts:element-value-query instead of cts:word-query. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.

    Secondly, there is no need for the cts:or-query. Both cts:word-query and cts:element-value-query functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.

    Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use XPath to navigate to parent and siblings.

    Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.

    To get what you ask, you could use the following code:

    <results>{
        for $phone in
            cts:search(
                doc()/Person/Phone,
                cts:element-value-query(
                    xs:QName("Phone"),
                    ("3738494044", "2373839383", "3933849383")
                )
            )
        return
            <match id="{data($phone/../@id)}" phone="{data($phone)}" name="{data($phone/../Name)}" city="{data($phone/../City)}"/>
    }</results>
    

    Best of luck.