Search code examples
lucenegraphdb

How compute lucene FuzzyQuery on top GraphDB lucene index?


GraphDB supports FTS Lucene plugin to build RDF 'molecule' to index texts efficiently. However, when there is a typo (missspell) in the word your are searching, Lucene would not retrieve a result. I wonder if it is possible to implement a FuzzyQuery based on the Damerau-Levenshtein algorithm on top the Lucene Index in GraphDB for FTS. That way even if the word is not correctly spell you can get a list of more 'closed' words based on an edit distance similarity.

This is the index I have created for indexing labels of NounSynset in WordNet RDF.

PREFIX wn20schema: <http://www.w3.org/2006/03/wn/wn20/schema/>
INSERT DATA {
    luc:index luc:setParam "uris" .
    luc:include luc:setParam "literals" .
    luc:moleculeSize luc:setParam "1" .
    luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#label" .
    luc:includeEntities luc:setParam wn20schema:NounSynset.
    luc:nounIndex luc:createIndex "true".
}

When running the query

select * where {
    {?id luc:nounIndex "credict"}
    ?id luc:score ?score.  
}

The result is empty and I would like to get at least the word "credit" as the edit distance is 1.

Thank you!!!


Solution

  • If you use the ~ it should give you a fuzzy match.