Search code examples
pythonregexdata-processingemoticonsunix-text-processing

unbalanced parenthesis regex


!pip install emot
from emot.emo_unicode import EMOTICONS_EMO
def convert_emoticons(text):
    for emot in EMOTICONS_EMO:
        text = re.sub(u'\('+emot+'\)', "_".join(EMOTICONS_EMO[emot].replace(",","").split()), text)
        return text

text = "Hello :-) :-)"
convert_emoticons(text)

I'm trying to run the above code in google collab, but it gives the following error: unbalanced parenthesis at position 4

My undesrtanding from the re module documentation tells that '\(any_expression'\)' is correct way to use, but I still get the error. So, I'have tried replacing '\(' + emot + '\) with:

  1. '(' + emot + ')', it gives the same error
  2. '[' + emot + ']', it gives the following output: Hello Happy_face_or_smiley-Happy_face_or_smiley Happy_face_or_smiley-Happy_face_or_smiley

The correct output should be Hello Happy_face_smiley Happy_face_smiley for text = "Hello :-) :-)"

Can someone help me fix the problem?


Solution

  • This is pretty tricky using regex, as you'd first need to escape the metachars in the regex that are contained in the emoji, such as :) and :(, which is why you get the unbalanced parens. So, you'd need to do something like this first:

    >>> print(re.sub(r'([()...])', r'%s\1' % '\\\\', ':)'))
    :\)
    

    But I'd suggest just doing a straight replacement since you already have a mapping that you're iterating through it. So we'd have:

    from emot.emo_unicode import EMOTICONS_EMO
    def convert_emoticons(text):
        for emot in EMOTICONS_EMO:
            text = text.replace(emot, EMOTICONS_EMO[emot].replace(" ","_"))
        return text
    
    
    text = "Hello :-) :-)"
    convert_emoticons(text)
    # 'Hello Happy_face_smiley Happy_face_smiley'