Search code examples
searchwildcardmarklogicmarklogic-8

poor search performance for certain wildcard queries


I am having performance issues when using wildcard searching for certain letter combinations, and I am not sure what else I need to to to possibly improve it. All of my documents are following an envelope pattern that look something like the following.

<pdbe:person-envelope>
    <person xmlns="http://schemas.abbvienet.com/people-db/model">
        <account>
            <domain/>
            <username/>
        </account>
        <upi/>
        <title/>
        <firstName>
            <preferred/>
            <given/>
        </firstName>
        <middleName/>
        <lastName>
            <preferred/>
            <given/>
        </lastName>
    </person>
    <pdbe:raw/>
</pdbe:person-envelope>

I have a field defined called name, which includes the firstName and lastName paths:

{
  "field-name": "name",
  "field-path": [
    {
      "path": "/pdbe:person-envelope/pdbm:person/pdbm:firstName",
      "weight": 1
    },
    {
      "path": "/pdbe:person-envelope/pdbm:person/pdbm:lastName",
      "weight": 1
    }
  ],
  "trailing-wildcard-searches": true,
  "trailing-wildcard-word-positions": true,
  "three-character-searches": true
}

When I do some queries using search:search, some come back fast, whereas others come back slow. This is with the filtered queries.

search:search("name:ha*",
  <options xmlns="http://marklogic.com/appservices/search">
    <constraint name="name">
      <word>
        <field name="name"/>
      </word>
    </constraint>
    <return-plan>true</return-plan>
  </options>
  )

I can see from the query plan that it is going to filter over all 136547 fragments in the db. But this query works fast.

<search:query-resolution-time>PT0.013205S</search:query-resolution-time>
<search:snippet-resolution-time>PT0.008933S</search:snippet-resolution-time>
<search:total-time>PT0.036542S</search:total-time>

However a search for name:tj* takes a long time, and also filters over all of the 136547 fragments.

<search:query-resolution-time>PT6.168373S</search:query-resolution-time>
<search:snippet-resolution-time>PT0.004935S</search:snippet-resolution-time>
<search:total-time>PT12.327275S</search:total-time>

I have the same indexes on both. Are there any other indexes I should be enabling when I am specifically just doing a search via the field constraint? I have these other indexes enabled on the database itself, in general.

"collection-lexicon": true,
  "triple-index": true,
  "word-searches": true,
  "word-positions": true

I tried doing an unfiltered query, but that did not help as I got a bunch of matches on the whole document, and not the the fields I wanted. I even tried to set the root-fragment to just my person element, but that did not seem to help things.

 "fragment-root": [
    {
      "namespace-uri": "http://schemas.abbvienet.com/people-db/model",
      "localname": "person"
    }
  ]

Thanks for any ideas.


Solution

  • Fragment roots are helpful if you want to use a searchable expression for that person element, and mostly if it occurs multiple times in one document. It won't make your current search constrain on that element.

    In your case you enabled a number of relevant options, but the wildcard option only works for 4 characters of more. If you want to search on wildcards with less characters, you need to enable the three, two and one character search options.

    The search phrases mentioned above both contained two characters with a wildcard. Since you only enabled the three character option, it had to rely on filtering. The fact some run fast, some slow is probably because of caching. If you repeat the same query, MarkLogic will return the result from cache.

    For performance testing you would either have to restart MarkLogic regularly to flush caches, or search on (semi) random strings to avoid MarkLogic being able to cache. Or maybe both..

    HTH!