Search code examples
lucenehighlightingjackrabbit

Jackrabbit - why does search excerpt contain all node properties concatenated?


When I perform a jackrabbit (version 2.2.9) search and I call get row.getValue("rep:excerpt()") the returned string is just all the properties (excluding jcr: properties) concatenated. How do I control this? eg. If I have a property called "description" containing "bla foo bla" when I search for "foo" I would like to see rep:excerpt() return part of just the description.

I tried creating an index config (and I deleted my repository between tests) in an attempt to control what properties were indexed, to no avail.

Workspace.xml...

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="supportHighlighting" value="true"/>
  <param name="excerptProviderClass" value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
  <param name="indexingConfiguration" value="${wsp.home}/indexing_configuration.xml"/>
</SearchIndex>

indexing_configuration.xml

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:teneoNode">
<property>description</property>
<property>input</property>
<property>key</property>
<property>comment</property>
  </index-rule>
</configuration>

Thanks.

Ted.


Solution

  • You can configure the ExcerptProvider (Javadoc) implementation which is responsible for the rep:excerpt() functionality in the SearchIndex element of you workspace.xml file:

     <param name="excerptProviderClass" value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
    

    You might need to plugin in your own implementation for you specific needs.

    There is also some - unfortunately rather old - information on the Jackrabbit Wiki.