Search code examples
marklogic

Why is MarkLogic structured word or value query returning more matches than expected?


The purpose of my structured query is to match all documents which have json property with value containing specific substring. So I wrote following code which builds this query with MarkLogic Java API:

        var jsonProperty = queryBuilder.jsonProperty("xyz");
        String[] wordOptions = {"case-insensitive", "wildcarded"};
        return queryBuilder.word(jsonProperty, null, wordOptions, 0, "*m-Em i*");

For some reason there are more search matches than expected. For example document with "xyz" json property containing "PM-EM 926-2:2020" is matched, but it shouldn't be. What might be the reason behind that problem?

I have also tried:

cts:search(fn:doc(), cts:json-property-word-query("xyz", "*m-Em I*", ("case-insensitive", "wildcarded")))

and it returns expected matches, but I would rather stick to structured query.


Solution

  • Do you get the same results if you add the "unfiltered" option to your cts:search()?

    "m-Em I" is not a word, it is a phrase that has - punctuation char and a leading wildcard and I* is a one character word with a trailing wildcard.

    So, unless you have the necessary backing indexes, you are likely just searching for "Em" and then with cts:search filtering by default, getting more relevant results.

    Take a look at the plan and see what your search winds up becoming:

    xdmp:plan(cts:search(fn:doc(), cts:json-property-word-query("xyz", "*m-Em I*", ("case-insensitive", "wildcarded"))))
    

    And take a look at the difference in results when applying "unfiltered" to the cts:search, or by wrapping the search with xdmp:estimate() to see what the unfiltered index resolved results would be before applying filtering.