Search code examples
pythondataframedictionarynlpkeyword-search

Keywords search in text column of data frame using dictionary


I am new to python and their is very specific requirement on which I got stuck due to limited knowledge, I will appreciate if someone can help with this

I have generated a dictionary using excel which look like this

dict = {'Fruit' : {'Comb Words' : ['yellow',
                                   'elongated',
                                   'cooking'],
                   'Mandatory Word' : ['banana',
                                       'banana',
                                       'banana']},
       'Animal' : {'Comb Words' : ['mammal',
                                   'white'
                                   'domestic'],
                  'Mandatory Word' : ['cat',
                                      'cat',
                                      'cat']}}

Now, I have a dataframe which has a text column and I want to match keywords from this dictionary with that column. For example:

            Text                     Mandatory      Comb            Final
A white domestic cat is playing        cat       domestic,white     Animal
yellow banana is not available        banana       yellow           Fruit

This dictionary is just an idea, I can change it since it is an input from excel. So any other format or way which can result in above output is the only aim here.


Solution

  • Using user-defined function:

    import pandas as pd
    
    Dict = {'Fruit' : {'Comb Words' : ['yellow',
                                       'elongated',
                                       'cooking'],
                       'Mandatory Word' : ['banana',
                                           'banana',
                                           'banana']},
           'Animal' : {'Comb Words' : ['mammal',
                                       'white',
                                       'domestic'],
                      'Mandatory Word' : ['cat',
                                          'cat',
                                          'cat']}}
                                          
    df = pd.DataFrame({'Text':['A white domestic cat is playing',
                                'yellow banana is not available']})
    
    def findMCF(sentence):
        for mand in sentence.split():
            for final in Dict:
                wordtypeDict = Dict[final]
                mandList = wordtypeDict['Mandatory Word']
                if mand in mandList:
                    C = [wrd for wrd in sentence.split() if word in wordtypeDict['Comb Words']]
                    return (mand,','.join(C),final)
    
    df['Mandatory'],df['Comb'],df['Final'] = zip(*df['Text'].map(findMCF))
    
    print(df)
    

    Output:

                                  Text Mandatory            Comb   Final
    0  A white domestic cat is playing       cat  white,domestic  Animal
    1   yellow banana is not available    banana          yellow   Fruit