Search code examples
pythonpandasstring-matching

how can I select all columns of a dataframe, which partially match strings in a list?


Suppose I have a Dataframe like:

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, 6], 'ber': [7, 8, 9]})

Given a list of "filter" strings like mylist = ['oo', 'ba'], how can I select all columns in df whose name partially match any of the strings in mylist? For this example, the expected output is {'foo': [1, 2, 3], 'bar': [4, 5, 6]}.


Solution

  • You can use df.filter with regex to do that.

    import pandas as pd
    
    # sample dataframe
    df = pd.DataFrame({'foo': [1, 2, 3], 'bar': [4, 5, 6], 'ber': [7, 8, 9]})
    
    # sample list of strings
    mylist = ['oo', 'ba']
    
    # join the list to a single string
    matches = '|'.join(mylist)
    
    # use regex to filter the columns based on the string
    df_out = df.filter(regex=matches)