Search code examples
marklogic

Marklogic index creation on xml value


XML structure looks something like this:

   <division>
     <sub-divison>
       <name>
         mime-type
       </name>
       <value>
         .jpeg
       </value>
     </sub-divison>
     <sub-divison>
        <name>
          status
        </name>
        <value>
          Work In Progress
        </value>
     </sub-divison>
    </division>

I need to index mime-type and status. What sort of indexing mechanism should be used, Path-Range-index or someother? Somehow I feel its not the right. Please suggest.


Solution

  • You do not have to do anything. MarkLogic automatically indexes element values, element-attribute values, and words. So without any configuration changes, you can write XPath expressions and construct cts:query terms for name and value. Those will use the built-in element value indexes. Try that, and see if it is fast enough for your application. If it isn't, the problem might be in the query rather than the indexing. You can use http://docs.marklogic.com/xdmp:plan or http://docs.marklogic.com/xdmp:query-trace to see which indexes are used.

    However there is some room for improvement. In your XML the value element does not mean much of anything until the name element is also examined. By analogy, consider what SQL would do when you have a column name and a column value, and want to select WHERE name=? AND value=?. The SQL evaluator would have to look up by name, and look up by value, and join the results. MarkLogic will do something similar, joining two lookup terms. At large scale, joins are expensive.

    So if you want optimal performance, refactor your XML to look more like this:

    <division>
     <sub-divison>
       <mime-type>.jpeg</mime-type>
       <status>.jpeg</status>
     </sub-divison>
    </division>
    

    With XML like that, a query on a single element no longer requires any joins. A query using both status and mime-type will join the results of those two lookups, while the old XML would have to join from three lookups. As a side benefit, that XML is also easier for a human to read and understand.