Search code examples
javanode.jsxquerymarklogic

Makrlogic using QBE - Matching Array Items


I am trying to create a just matches of an array without to use the brute force to exclude the other possible records:

JSON 1:

{
   "lastNames":["smith", "val", "zuul"]
}

JSON 2:

{
   "lastNames":["smith"]
}

based on the previous JSON, I just need to retrieve the 2nd JSON based on the next QBE

{ "$query": {"lastNames": ["smith"]} }

which is going to retrieve the 2 documents, and Marklogic documentation also indicates the same.

please check https://docs.marklogic.com/guide/search-dev/qbe#id_16310 - Matching Array Items

The brute force approach will look like:

{
   "$query":{
      "$and":{
         "lastNames":[
            "smith"
         ]
      },
      "$not":{
         "lastNames":[
            "val"
         ]
      },
      "$not":{
         "lastNames":[
            "zuul"
         ]
      }
   }
}

as you see we need to know the values to be excluded.

Is there any other approach using QBE or Node or Java or XQuery to avoid to use the brute force approach?


Solution

  • Short answer: QBE cannot match any exact document structure including an array structure.

    Context: QBE executes against the MarkLogic universal index and range indexes.

    On JSON documents, the MarkLogic universal index contains property names and atomic values:

    • each object provides a bag of indexed names
    • each array contributes to a bag of multiple values
    • each atomic value is indexed

    Matching against indexes is important for performance and scale.

    To put it the other way, inspecting the actual structure of every document instead of the indexes projected from the documents would be impractical for almost any production database.

    Some alternatives:

    • Inspect the matched documents to eliminate the false positives (preferably in SJS on the enode)
    • Materialize the concatenation of the array items with a _ separator character in the document as a property used only in queries.
    • Solve the problem with TDE and the Optic API.

    Expanding on the last alternative:

    • TDE could concatenate the array items as a value in a single-column view.
    • Or, TDE could project each array item as a separate row and the query could group on the document fragment id, sample the row, filter out groups where the count is > 1, and join the documents

    Hoping that's useful,