Search code examples
marklogicmarklogic-8

Gets Nodes containing the Search snippet in a document


Is there a way to get the node that contains the search snippet for eg:-

I have a sample xml doc

<pdf2xml>
  <page pageNo="1">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</page>
  <page pageNo="2">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</page>
  <page pageNo="3">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</page>
  <page pageNo="4">xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</page>
</pdf2xml>

How do I get the pageNo for a given search result ? I tried the following

search:snippet(fn:doc($uri), 
  cts:query(search:parse($q, $options)),  
    <transform-results apply="snippet" xmlns="http://marklogic.com/appservices/search">
      <per-match-tokens>30</per-match-tokens> 
      <max-matches>1000</max-matches> 
      <max-snippet-chars>2000</max-snippet-chars>
      <preferred-matches>
        <element name="page" ns=""/>
      </preferred-matches>
    </transform-results>)

This does not give all the snippets as well ... what is a good a way of doing what I want to do ?


Solution

  • Looking for all of the snippets in a document, returning the containing element and highlighting them can be done with cts:walk and cts:snippet

    xquery version "1.0-ml";
    
    
    let $content := <pdf2xml>
      <page pageNo="1">xxxxxxxxxxxxxx 1 xxxxxxxxx</page>
      <page pageNo="2">xxxxxxxxxxxxxx 2 xxxxx foo xxxxxxxx</page>
      <page pageNo="3">xxxxxxxxxxxxxxx 3 xxxxxxxxxxxxxxxxxxxxxxx</page>
      <page pageNo="4">xxxxxxxxxxxxxxxxx 4 xxxxxxxxxxx foo xxxxxxxxxx</page>
    </pdf2xml>
    
    let $q := cts:word-query("foo")
    
    return <results> 
    {cts:walk($content, $q , 
      <result>
        <original-node>{$cts:node/parent::*}</original-node>
        <highlighted-content>{cts:highlight($cts:node/parent::*, $q, <matched>{$cts:text}</matched>)}</highlighted-content>
      </result>
      )}
    </results>
    

    Results in:

    <results>
      <result>
        <original-node>
          <page pageNo="2">xxxxxxxxxxxxxx 2 xxxxx foo xxxxxxxx</page>
        </original-node>
        <highlighted-content>
          <page pageNo="2">xxxxxxxxxxxxxx 2 xxxxx <matched>foo</matched> xxxxxxxx</page>
        </highlighted-content>
      </result>
      <result>
        <original-node>
          <page pageNo="4">xxxxxxxxxxxxxxxxx 4 xxxxxxxxxxx foo xxxxxxxxxx</page>
        </original-node>
          <highlighted-content>
            <page pageNo="4">xxxxxxxxxxxxxxxxx 4 xxxxxxxxxxx <matched>foo</matched> xxxxxxxxxx</page>
          </highlighted-content>
        </result>
    </results>
    

    This may not be what you want, but I still offer it up as an example of some of the power you have available to you for manipulating your results (or in the example, extracting and highlighting content as a result of search or not)