Search code examples
xpathindexingmarklogic

Marklogic index trouble


I have some issues getting my indexes right. I am trying to query a(big 80Mb) document that has a lot of "map" attributes like

<oplage version="0.2" xmlns="http://www.nvsp.nl/oplage-mapping">
<meta-data>
<!--Generated by DIKW for NetwerkVSP STTip-->
<dateCreated>2014-04-03 13:23:16.885124</dateCreated>
</meta-data>
<map ppc6_id="1001WE" wijk_id="">
   <bruto>0</bruto>
   <stickers>0</stickers>
   <netto>0</netto>
</map>

Question 1 is actually: do i need to split up this doc? It is 80MB in size i needed to increase in-memory list sizes. I read somewhere that having large documents in memory is not a good idea, in general. This document holds a n:m relationship between two types of objects "ppc6" objects and "wijk" objects. I need to get a good performance on my 'aggregation' function that finds all ppc6 objects that together make up a 'wijk' object. Typically there are around 500.000 ppc6 objects and 40.000 'wijk'objects.

I have made a fragement root for this document on the map element.

Element range index on map element.

Attribute range index on ppc6_id and wijk_id like

scalartype is string
parent namespace uri : "http://www.nvsp.nl/oplage-mapping"
parent local name : map
namespace uri equal to parent namespace (can this ever be *not* the same???)
localname :  wijk_id,ppc6_id (not sure how to add more then one here?)

my query is like:

xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
declare namespace op = "http://www.nvsp.nl/oplage-mapping";

let $d := '/data/map/oplage-mapping.xml'
let $ids := fn:doc($d)//op:map/@wijk_id
let $u := fn:distinct-values($ids)
let $id2 := cts:element-attribute-values(xs:QName("map"),
                           xs:QName("wijk_id"),
                           "*")

return (fn:count($ids),fn:count($u))

We first went down the xpath routebut this does not perform, we need cts power... so we need indexes...

The query gives me an error like:

XDMP-ELEMATTRRIDXNOTFOUND: cts:element-attribute-values(fn:QName("", "map"), fn:QName("", "wijk_id"), "*") -- No string element-attribute range index for fn:QName("", "map") fn:QName("", "wijk_id") http://marklogic.com/collation/
on line 8
expr: cts:element-attribute-values(fn:QName("", "map"), fn:QName("", "wijk_id"), "*")

index not found: string element-attribute range index for ... no clue where to go next?

I cannot find that much documentation/ working examples on setting up specific range indexes...


Solution

  • Should you split up the 80-MB document? Yes, probably. As mentioned MarkLogic wants documents to act like rows, not tables. An alternative design would be to create a map:map item and store that as a document, but that would be an unusual approach and I'm not sure if it's really suitable. There would be significant extra latency in each query to load up the map, for example.

    What's wrong with your range index? Namespaces. Your attribute is in the empty namespace, not the parent element namespace. Default element namespace declarations (xmlns="...") do not apply to attributes.

    Also when you call cts:element-attribute-values you need to supply the right namespace for the parent QName. And drop the '*' parameter: that's for cts:element-attribute-value-match, which matches wildcards against a lexicon. If you want all the values it's more efficient to call cts:element-attribute-value with empty sequence.

    Finally, look into http://docs.marklogic.com/cts:value-co-occurrences and its map option. That may be exactly what you need.