Search code examples
pythonstringsubstringseries

Remove list of words from a string Series


I have a list of words to remove:

words_list_to_remove = ['abc', 'def', 'ghi', 'jkl']

I want to remove these words from the string Series (df):

My_strings
first
abc
second
third
def
forth
ghi
jkl

My goal new_df:

My_new_strings
first
second
third
forth

I want to keep each element as a string and also keep the index of each element. I tried to convert both of them to set but did not work for me.

Any help would appreciate it!


Solution

  • You can use .isin() and pass your words_list_to_remove to it:

    import pandas as pd
    
    # Define Pandas Series that holds your data
    df = pd.Series(["first","abc","second","third","def","forth","ghi","jkl"])
    
    print("before dropping:\n", df)
    
    # Define list of strings to drop
    words_list_to_remove = ['abc', 'def', 'ghi', 'jkl']
    
    # Only keep rows that are not in list
    df = df[~df.isin(words_list_to_remove)]
    
    print("\nafter dropping:\n", df)
    

    As you can see in the output, the index is preserved as well:

    before dropping:
    0     first
    1       abc
    2    second
    3     third
    4       def
    5     forth
    6       ghi
    7       jkl
    dtype: object
    
    after dropping:
    0     first
    2    second
    3     third
    5     forth
    dtype: object
    

    Note: you would usually name a DataFrame as df, it would be better to rename your Series something else to avoid confusion.