Search code examples
javalucenesparqlgraphdb

How to create a custom AnalyzerFactory in GraphDB full text search?


(Using GraphDB 8.1 free). http://graphdb.ontotext.com/documentation/free/full-text-search.html says that I can enable a custom AnalyzerFactory for GraphDB full-text search, using the luc:analyzer param, by implemeting the interface com.ontotext.trree.plugin.lucene.AnalyzerFactory. However I can't find this interface anywhere. It is not in the jar graphdb-free-runtime-8.1.0.jar.

I checked the feature matrix at http://ontotext.com/products/graphdb/editions/#feature-comparison-table and it seems this feature '"Connectors Lucene" is available for the free edition of GraphDB.

In which jar is the com.ontotext.trree.plugin.lucene.AnalyzerFactory interface located ? what do I need to import in my project to implement this interface ?

Is there pre-existing AnalyzerFactories included with GraphDB to use Lucene other analyzers ? (I am interested in using a FrenchAnalyzer).

Thanks !


Solution

  • GraphDB offers two different Lucene-based plugins.

    I encourage you to use the Lucene Connector, unless you don't have a special case for RDF molecules. Here is a simple example how to configure the connector with French analyzer and index all values for rdfs:label predicate for resources of type urn:MyClass. Select a repository and from the SPARQL query view execute:

      PREFIX :<http://www.ontotext.com/connectors/lucene#>
      PREFIX inst:<http://www.ontotext.com/connectors/lucene/instance#>
      INSERT DATA {
        inst:labelFR-copy :createConnector '''
      {
        "fields": [
          {
            "indexed": true,
            "stored": true,
            "analyzed": true,
            "multivalued": true,
            "fieldName": "label",
            "propertyChain": [
              "http://www.w3.org/2000/01/rdf-schema#label"
            ],
            "facet": true
          }
        ],
        "types": [
          "urn:MyClass"
        ],
        "stripMarkup": false,
        "analyzer": "org.apache.lucene.analysis.fr.FrenchAnalyzer"
      }
      ''' .
      }
    

    Then manually add some sample test data from Import > Text area:

    <urn:instance:test>  <http://www.w3.org/2000/01/rdf-schema#label> "C'est une example".
    <urn:instance:test> a <urn:MyClass>.
    

    Once you commit the transaction, the Connector will update the Lucene index. Now you can run search queries like:

    PREFIX : <http://www.ontotext.com/connectors/lucene#>
    PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#>
    SELECT ?entity ?snippetField ?snippetText {
        ?search a inst:labelFR ;
                :query "label:*" ;
                :entities ?entity .
        ?entity :snippets _:s .
        _:s :snippetField ?snippetField ;
            :snippetText ?snippetText .
    }
    

    To create a custom analyzer follow the instructions in the documentation and extend org.apache.lucene.analysis.Analyzer class. Put the custom analyzer JAR in lib/plugins/lucene-connector/ path.