Search code examples
javalucenemetaphone

How to integrate Metaphone to the spellchecker program in java-lucene?


While browsing i came up with a spellchecking program in lucene.I was interested in adding the phonetix add-on(specifically metaphone) from tangentum. Is there a way i can integrate metaphone into my program? How to integrate it?

package com.lucene.spellcheck;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.spell.Dictionary;
import org.apache.lucene.search.spell.PlainTextDictionary;
import org.apache.lucene.search.spell.SpellChecker;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class SimpleSuggestionService {
private static final String F_WORD = null;
public static void main(String[] args) throws Exception {
File dir = new File("e:/spellchecker/");
Directory directory = FSDirectory.open(dir);
SpellChecker spellChecker1 = new SpellChecker(directory);
spellChecker1.indexDictionary(
new PlainTextDictionary(new File("c:/fulldictionary00.txt")));
String wordForSuggestions = "noveil";
int suggestionsNumber = 5;
String[] suggestions = spellChecker1.
suggestSimilar(wordForSuggestions, suggestionsNumber);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
}
}    

Solution

  • You can pass in a custom StringDistance implementation that utilizes desired phonetic algorithms, or combines it is some way with other similarity algorithms (such as the standard LevensteinDistance. You'll just need to implement the getDistance(String, String) method in you StringDistance implementation. Perhaps something like:

    public MetaphoneDistance() {
        Metaphone metaphone = new Metaphone();
    }
    
    //I'm not really familiar with the library you mentioned, but I assume generateKeys performs a double metaphone?
    public float getDistance(String str1, ,String str2) {
        String[] keys1 = metaphone.getKeys(str1);  
        String[] keys2 = metaphone.getKeys(str2);
        float result = 0;
        if (key1[0] == key2[0] || key1[0] == key2[1]) result += .5
        if (key1[1] == key2[0] || key1[1] == key2[1]) result += .5
        return result;
    }