Search code examples
xquerymarklogic

Marklogic: Find documents containing elements without a particular attribute (maybe many per document)


I have some data which looks something like this:

<wrapper>
  <inner a="1"/>
  <inner a="2" b="3"/>
</wrapper>

The attribute b may or may not be present on each inner element. My aim is to find all documents containing at least one inner element that doesn't have attribute b.*

This similar question proposes the answer:

cts:not-query(cts:element-attribute-value-query(xs:QName('inner'), xs:QName('b'), '*', ("wildcarded"))))

but that doesn't work, because some inner elements on the same document may have attribute b, and not-queries work on the entire fragment, so a mixed case like the example above would not be returned. Wrapping it in an element-query doesn't help, and cts:and-not-query seems to behave the same way.

I have also tried attacking the problem using co-occurrence/values functions to read the values of relevant attributes a, but that also seems to be impossible. It might have been possible with proximity settings on co-occurrences calls except there is no element text, so the attribute are indexed with the same word positions.

Are there any alternatives to the blunt xpath?

//inner[@a and not(@b)]

Solution

  • cts:not-in-query has the necessary behaviour to make this work where cts:and-not-query doesn’t. E.g.

    cts:not-in-query(
      cts:element-query(xs:QName('inner'), cts:true-query()),
      cts:element-attribute-query(xs:QName('inner'), xs:QName('b'),'*','wildcarded')
    )
    

    Finds all ‘inner’ elements at positions that do not match the positions of ‘inner’ elements with attribute b.

    Element position index must be enabled. Wildcard index must be enabled.

    http://docs.marklogic.com/cts:not-in-query