Search code examples
pythonregexarabic

Arabic regex giving TypeError


I have this simple regex:

    text = re.sub("[إأٱآا]", "ا", text)

However, I get this (Python 2.7) error:

TypeError: expected string or buffer

I'm a regex newbie, I imagine this is a simple thing to fix, but I'm not sure how? Thanks.


Solution

  • Define all your strings as unicode and don't forget to add the encoding line in the header of the file:

    #coding: utf-8
    
    import re
    
    text = re.sub(u"[إأٱآا]", u"ا", u"الآلهة")
    
    print text
    

    To get:

    الالهة