Search code examples
pythonpandasdataframecounterdata-manipulation

How to make a new column with counter of the number of times a word from a predefined list appears in a text column of the dataframe?


I want to build a new column which contains the count of the number of times a word from ai_functional list occurs in a text column.

List given is:

> ai_functional = ["natural language
> processing","nlp","A I ","Aritificial intelligence", "stemming","lemmatization","lemmatization","information
> extraction","text mining","text analytics","data-mining"]

the result I want is as follows:

> text                                                                     counter
> 
> 1. More details  A I   Artificial Intelligence                             2
> 2. NLP works very well these days                                          1         
> 3. receiving information at the right time                                 1

The code i have been using is

def func(stringans):
  for x in ai_tech:
    count = stringans.count(x)
  
  return count

df['counter']=df['text'].apply(func)

Please can someone help me with this. I am really stuck because everytime i apply this i get result as 0 in the counter column


Solution

  • As you do count = , you erase the previous value, you want to sum up the different counts

    def func(stringans):
        count = 0
        for x in ai_tech:
            count += stringans.count(x)
        return count
    
    # with sum and generator 
    def func(stringans):
        return sum(stringans.count(x) for x in ai_tech)
    

    Fixing some typos in ai_tech and setting all to .lower() gives 2,1,0 in the counter col, the last row has no value in common

    import pandas as pd
    
    ai_tech = ["natural language processing", "nlp", "A I ", "Artificial intelligence",
               "stemming", "lemmatization", "information extraction",
               "text mining", "text analytics", "data - mining"]
    
    df = pd.DataFrame([["1. More details  A I   Artificial Intelligence"], ["2. NLP works very well these days"],
                       ["3. receiving information at the right time"]], columns=["text"])
    
    def func(stringans):
        return sum(stringans.lower().count(x.lower()) for x in ai_tech)
    
    df['counter'] = df['text'].apply(func)
    print(df)
    
    # ------------------
                                                 text  counter
    0  1. More details  A I   Artificial Intelligence        2
    1               2. NLP works very well these days        1
    2      3. receiving information at the right time        0