Search code examples
xmlxpathlucenexqueryexist-db

Fulltext Xquery (Lucene/KWIC) doesn't work on "tagged" result. eXist-db bug?


After reading XQuery documentation and eXist-db documentation, I can't figure it out. The fulltext search with KWIC doesn't work if result is put in a tag.

Explanations

XML file

<root>
    <node>blablabla</node>
    <node>blab KEYWORD labla</node>
    <node>blablabla</node>
</root>

Index configuration (collection.xconf)

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <lucene>
            <text qname="root"/>
            <text qname="node"/>
        </lucene>
    </index>
</collection>

XQuery without "tagged" result (it works) (look at return $node)

let $my_texts := 
    for $node in collection("path_to_my_collection")//node
    return
        $node

for $my_hit in $my_texts[ft:query(., "KEYWORD")]
return 
    $my_hit

The Xquery code above works and I get a result.

1
<node>blab KEYWORD labla</node>

But it doesn't work when a first result on which fulltext search is launched was put in a tag. (My whole query is more complex and I need to put this result in the tag to use it in another place of my code.)

XQuery with "tagged" result (it doesn't work) (look at return <tag>{$node}</tag>)

let $my_texts := 
    for $node in collection("path_to_my_collection")//node
    return
        <tag>{$node}</tag>

for $my_hit in $my_texts[ft:query(., "KEYWORD")]
return 
    $my_hit

This query return 0 result.

When I debug like this:

XQuery for debugging

let $my_texts := 
    for $node in collection("path_to_my_collection")//node
    return
        <tag>{$node}</tag>

return 
    $my_texts

I get this:

1
<tag>
    <node>blablabla</node>
</tag>

2
<tag>
    <node>blab KEYWORD labla</node>
</tag>

3
<tag>
    <node>blablabla</node>
</tag>

What I tried:

  • different path combinations: $my_texts/tag[ft:query(., "KEYWORD")], $my_texts/tag/node[ft:query(., "KEYWORD")], $my_texts/*[ft:query(., "KEYWORD")], $my_texts/tag//*[ft:query(., "KEYWORD")], $my_texts//*//*[ft:query(., "KEYWORD")] etc...
  • add <tag> in the Index configuration (<text qname="tag"/>)

What I missed? Or it is an eXist-db bug? (my eXist version: 4.7.0)

UPDATE:

  1. Thanks to a suggestion from eXist-db mailing list.

The problem may lie in the absence of the index on this intermediate internal result (return <tag>$node</tag>). Even if <tag> was added to index config, this <tag> is not there at the time the index is built... If this is the problem, the question is how to put an index on the intermediate internal result. Is it even possible? Maybe someone has some leads? eXist-db documentation is not very helpful. The closest to this I have found is: https://exist-db.org/exist/apps/doc/lucene#constructed-fields

  1. Why I even need to put this <tag> here?

I have two collections with quite similar data but the different XML schemas, so I have to query them separately (but I need a common result). So for now I have two fulltext queries on each collection and then I combine obtained results. My goal is optimization: go from two fulltext queries (slow) to only one (fast). For this I do 1) from each collection select the files that meet my criteria; 2) from selected files (from two collections) extract data I need; 3) from this data construct combined intermediate internal result (here I put <tag> on the part of this result where I want to make fulltext query); 4) make fulltext query (only one) on this combined intermediate internal result. Maybe I'm wrong and this approach is not most optimised…

UPDATE AND ANSWER

Thanks to the eXist-db mailing list community and especially to Joe. The answer is:

[...] the newly constructed element has no connection to the original one (i.e., the wrapped node loses its identity), and you are no longer able to query it using the full text index [...]

See full answer and possible workarounds here: https://sourceforge.net/p/exist/mailman/message/37170946/

So I would like to mark this question as answered, but I wouldn't like to post the answer as mine; the answer goes to Joe from the eXist-db mailing list.


Solution

  • The newly constructed element has no connection to the original one (i.e., the wrapped node loses its identity). Thus, you are no longer able to query it using the full text index.

    (While redundant, I'm adding this so the answer is registered. As noted, the full discussion happened in https://sourceforge.net/p/exist/mailman/message/37170946/.)