Dataframes:
df1:
ind_lst
[agriculture_dairy, analytics]
[architecture_planning, advertising_pr_events, analytics]
df2:
ind score
advertising_pr_events 3.672947168
agriculture_dairy 3.368266582
airlines_aviation_aerospace 3.60798955
analytics 3.67752552
architecture_planning 3.853236675
I have 2 data frames as shown above.
Column ind of df2 is of type string & has industry names. df2.score has a unique score for each industry.
Column ind_lst of df1 is a list of industry names formed by different combinations of unique elements of df2.ind
I want to create a column in df1 avg_score such that the final output is as below:
avg_score in row1 of df1 is avg of the scores from df2 corresponding to agriculture_dairy, analytics
avg_score in row2 of df1 is avg of scores from df2 corresponding to architecture_planning, advertising_pr_events, analytics.
df1
ind_lst avg_score
[agriculture_dairy, analytics] 3.522896051
[architecture_planning, advertising_pr_events, analytics] 3.734569787666667
How to implement this in python? I tried this:
avg=[]
avgs=[]
for i in df1.ind_lst:
for j in i:
a= df2[df2['ind'] == j].index
avg.append(df2.loc[a]['score'])
avgs.append(mean(avg))
df1['avg_score']= avgs
the code doesn't work well. I get an empty list as output.
NOTE: I have sampled the data frames & shared only a subset of the problem statement. the real dataframes df1 has 90k entries & df2 has 860 rows.
Option 1: Create a map of ind and score from df2. Use list comprehension to map the score to the industry and calculate mean.
mapper = df2.set_index('ind')['score'].to_dict()
df1['avg_score'] = df1['ind_lst'].apply(lambda x: np.mean([mapper[i] for i in x]))
ind_lst avg_score
0 [agriculture_dairy, analytics] 3.522896
1 [architecture_planning, advertising_pr_events,... 3.734570
Option 2: Use DataFrame.explode to unnest the list, groupby index and apply mean
df1['avg_score'] = df1['ind_lst'].explode().map(mapper).groupby(level = 0).mean()