Search code examples
javacodecphoneticsmetaphone

Why is Java's Double Metaphone only giving four letter codes?


I want to use DoubleMetaphone to get a phonetic encoding of a given string. For example:

import org.apache.commons.codec.language.DoubleMetaphone;
String s1 = "computer";
(new DoubleMetaphone()).doubleMetaphone(s1);

Result: Computer -> KMPT

The issue arises when I try to encode longer strings.

import org.apache.commons.codec.language.DoubleMetaphone;
String s1 = "dustinhoffmanisanactor";
(new DoubleMetaphone()).doubleMetaphone(s1);

Result: dustinhoffmanisanactor -> TSTN

Clearly it's taking the first 4 encoded characters and halting. In this case Dustin -> TSTN.

I used the Python implementation of Double Metaphone and it works as expected.

>>>from metaphone import doublemetaphone
>>>doublemetaphone("dustinhoffmanisanactor")[0]
"TSTNFMNSNKTR"

Solution

  • Seems I needed to set the max code length.

    String s1 = "dustinhoffmanisanactor";
    DoubleMetaphone dm = new DoubleMetaphone();
    dm.setMaxCodeLen(100);
    dm.doubleMetaphone(s1);
    

    Which gives the expected TSTNFMNSNKTR.