Search code examples
pythonpandasstringdata-cleaningtrim

Trimming specifc words in a dataframe


I have a df with some trigrams (and some more ngrams) and I would like to check if the sentence starts or ends with a list of specific words and remove them from my df. For example:

import pandas as pd
df = pd.DataFrame({'Trigrams+': ['because of tuna', 'to your family', 'pay to you', 'give you in','happy birthday to you'], 'Count': [10,9,8,7,5]})

list_remove = ['of','in','to', 'a']

print(df)

    Trigrams+            Count
0   because of tuna       10
1   to your family         9
2   pay to you             8
3   give you in            7
4   happy birthday to you  5

I tried using strip but in the example above the first row would return because of tun

The output should be like this:

list_remove = ['of','in','to', 'a']

    Trigrams+             Count
0   because of tuna        10
1   pay to you              8
2   happy birthday to you   5

Can someone help me with that? Thanks in advance!


Solution

  • Try:

    list_remove = ["of", "in", "to", "a"]
    
    tmp = df["Trigrams+"].str.split()
    
    df = df[~(tmp.str[0].isin(list_remove) | tmp.str[-1].isin(list_remove))]
    print(df)
    

    Prints:

                   Trigrams+  Count
    0        because of tuna     10
    2             pay to you      8
    4  happy birthday to you      5