Search code examples
pythonpyenchant

PyEnchant weird behavior for numbers


I am using PyEnchant for some spelling/grammar correction scripting. I have noticed this behavior on my Mac:

>>> import enchant
>>> d  = enchant.Dict('en_us')
>>> d.suggest('50')
['W', 'Y', 'w', 'y', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'z']
>>> enchant.__version__
'1.6.6'

However, it works more predictably on my linux machine (same version of pyenchant)

>>> import enchant
>>> d = enchant.Dict('en_us')
>>> d.suggest('50')
['5', '0', '50s']

Solution

  • It is due to the underlying provider. On Ubuntu I have an en_US dictionary installed for both myspell and aspell. If I switch providers I get different results. E.g. with a script like this:

    import enchant
    
    b = enchant.Broker()
    b.set_ordering("en_US","myspell,aspell")
    print b.describe()
    d=b.request_dict("en_US")
    print d.provider
    s = '50'
    print d.suggest(s)
    
    b = enchant.Broker()
    b.set_ordering("en_US","aspell,myspell")
    print b.describe()
    d=b.request_dict("en_US")
    print d.provider
    s = '50'
    print d.suggest(s)
    

    I get the following output.

    [<Enchant: Aspell Provider>, <Enchant: Ispell Provider>, <Enchant: Myspell Provider>, <Enchant: Hspell Provider>]
    <Enchant: Myspell Provider>
    ['5', '0', '50s']
    [<Enchant: Aspell Provider>, <Enchant: Ispell Provider>, <Enchant: Myspell Provider>, <Enchant: Hspell Provider>]
    <Enchant: Aspell Provider>
    ['W', 'Y', 'w', 'y', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'z']
    

    The first set of suggestions is what you are seeing on Linux but I am using Myspell Provider. The second is what you are seeing on your Mac and I am using Aspell Provider.