Search code examples
pythonnlp

How to remove words from a data frame that are not in list in python


I am a beginner on working on non-English NLP, I want to clean all words in a data frame that are not contained in list kata_dasar :

My code is :

df['tweet']= [' '.join(w for w in p.split() if w in kata_dasar) for p in df['tweet']]

But it is not working, Please help


Solution

  • In general, if you find yourself tempted to write a for-loop to iterate over rows of a dataframe, stop, and try to find a way to write it with apply instead :

    df['tweet'] = df.tweet.apply(lambda p: ' '.join(w for w in p.split() if w in kata_dasar))