Search code examples
pythonregexemoticons

Python regex: find words and emoticons


I want to find matches between a tweet and a list of strings containing words, phrases, and emoticons. Here is my code:

words = [':)','and i','sleeping','... :)','! <3','facebook'] regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)

I keep receiving this error:

error: unbalanced parenthesis

Apparently there is something wrong with the code and it cannot match emoticons. Any idea how to fix it?


Solution

  • The re module has a function escape that takes care of correct escaping of words, so you could just use

    words = map(re.escape, [':)','and i','sleeping','... :)','! <3','facebook'])

    Note that word boundaries might not work as you expect when used with words that don't start or end with actual word characters.