Search code examples
pythonpandasdataframeloopspandas-loc

python: loop through list of dataframes and list of lists


I have a list of dataframes (lst_dfs) of which i want to subset columns using a list of partial column names (lst). The list of the columns needs to be applied with startswith as sometimes the end of the identifier is different. the index of the list of dataframes matches the index of the list of names. Its easy to apply with one dataframe, but not with this list/in a loop. The expected output would be a dictionary containing a list of two dataframes with the subsetted columns but its returning empty. I think my level of iteration is incorrect (amongst other things?). any help is very appreciated. thanks so much!

two data framees that i put in to a list

df1 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'abn.1': [1,1,1,1,1],
                         'b1c': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1]})

df2 = pd.DataFrame(data={'x':[1,2,3,4,5], 
                         'am.1': [1,1,1,1,1],
                         'am.1': [1,1,1,1,1],
                         'al.2': [1,1,1,1,1],
                         'b1d': [1,1,1,1,1],
                         'b2d': [1,1,1,1,1]})

lst_dfs = [df1, df1]

lst = (['a','b'],['am','b1'])
dat={}
for i, df in enumerate(lst_dfs):
    for elem in lst:
        print(elem)
        dat[i] = df.loc[(df.columns.str.startswith(str(elem)))]


Solution

  • Use df.filter with regex param to filter out the columns that start with elements in your list:

    from collections import defaultdict
    dat = defaultdict(list)
    
    for i, df in enumerate(lst_dfs):
        for elem in lst:
            dat[i].append(df.filter(regex='^('+'|'.join(elem)+')', axis=1))
    

    Output:

    >>> dat[0]
    [   am.1  abn.1  b1c  b1d
    0     1      1    1    1
    1     1      1    1    1
    2     1      1    1    1
    3     1      1    1    1
    4     1      1    1    1,    am.1  b1c  b1d
    0     1    1    1
    1     1    1    1
    2     1    1    1
    3     1    1    1
    4     1    1    1]