Search code examples
pythonpandasdataframedictionarysubstring

How to select dataframe in dictionary of dataframes that contains a column with specific substring


I have a dictionary of dataframes df_dict. I then have a substring "blue". I want to identify the name of the dataframe in my dictionary of dataframes that has at least one column that has a name containing the substring "blue".

I am thinking of trying something like:

for df in df_dict:
     if df.columns.contains('blue'):
          return df
     else:
          pass

However, I am not sure if a for loop is necessary here. How can I find the name of the dataframe I am looking for in my dictionary of dataframes?


Solution

  • I think loops are necessary for iterate items of dictionary:

    df1 = pd.DataFrame({"aa_blue": [1,2,3],
                       'col':list('abc')})
    df2 = pd.DataFrame({"f": [1,2,3],
                       'col':list('abc')})
    df3 = pd.DataFrame({"g": [1,2,3],
                       'bluecol':list('abc')})
    
    df_dict = {'df1_name' : df1, 'df2_name' : df2, 'df3_name' : df3}
    
    out = [name for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
    print (out)
    ['df1_name', 'df3_name']
    

    Or:

    out = [name for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
    print (out)
    ['df1_name', 'df3_name']
    

    For list of DataFrames use:

    out = [df for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
    out = [df for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
    print (out)
    [   aa_blue col
    0        1   a
    1        2   b
    2        3   c,    g bluecol
    0  1       a
    1  2       b
    2  3       c]