I have a dictionary of dataframes df_dict
. I then have a substring "blue". I want to identify the name of the dataframe in my dictionary of dataframes that has at least one column that has a name containing the substring "blue".
I am thinking of trying something like:
for df in df_dict:
if df.columns.contains('blue'):
return df
else:
pass
However, I am not sure if a for loop is necessary here. How can I find the name of the dataframe I am looking for in my dictionary of dataframes?
I think loops are necessary for iterate items of dictionary:
df1 = pd.DataFrame({"aa_blue": [1,2,3],
'col':list('abc')})
df2 = pd.DataFrame({"f": [1,2,3],
'col':list('abc')})
df3 = pd.DataFrame({"g": [1,2,3],
'bluecol':list('abc')})
df_dict = {'df1_name' : df1, 'df2_name' : df2, 'df3_name' : df3}
out = [name for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
print (out)
['df1_name', 'df3_name']
Or:
out = [name for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
print (out)
['df1_name', 'df3_name']
For list of DataFrames use:
out = [df for name, df in df_dict.items() if df.columns.str.contains('blue').any()]
out = [df for name, df in df_dict.items() if any('blue' in y for y in df.columns)]
print (out)
[ aa_blue col
0 1 a
1 2 b
2 3 c, g bluecol
0 1 a
1 2 b
2 3 c]