Search code examples
pythonregexstringnlptopic-modeling

Is there a way to using reg expressions from a list?


I have a list of topic words, containing reg expressions:

list = ['peaceful','thank\s?god','infuriated','mood\s?dropped']

And a dictionary mapping list of topic words to its topic:

dict = {'peaceful': 'Restful','thank\s?god': 'Thankful','infuriated': 'Angry','mood\s?dropped':'Sad'}

The goal is to write a function to determine if a string contains reg expression in the list, and return to the matching topic.

It is possible that some mentions can be matched by multiple topics. So it needs to keep all matched topics. Also, case sensitivity would be another issue to consider.

I tried:

import re
def topic_emotion(text):
    text_lower=text.lower()
    output = []
    for elem in Emotion_Keywords_list:
        if bool(re.search(elem, text_lower)):
            output.append(Emotion_dict[elem])
    return output

For example:

topic_emotion('todaypeacefulday Im INFURIATED') = ['Restful','Angry']

But it seems wrong and can not handle reg expressions cases, is there any other factors that I should consider?


Solution

  • The above code works fine on Python 3.7

    Regex patters involve many special characters. A better way is to compile pattern before use.

    import regex as re
    

    Emotion_Keywords_list = ['peaceful','thank\s?god','infuriated','mood\s?dropped']

    topic = ['Restful','Thankful','Angry','Sad']

    import re def topic_emotion(text): output = [] compiled_keywords =[]

    for elem in Emotion_Keywords_list:
        compiled_keywords.append(re.compile(elem,flags=re.I))
    
    emotion_dict=dict(zip(compiled_keywords,topic))
    
    for elem in compiled_keywords:
        if bool(re.search(elem, text)):
            output.append(emotion_dict[elem])
    return output