I try to learn more about the apply method in python and asking myself how to write the following code using apply:
I have a dataframe df like the following:
A B C D E points
0 0 0 0 1 43 94
1 0 0 1 1 55 62
2 1 1 0 1 21 84
3 1 0 1 0 13 20
Furthermore I have a function like the following, which does its job:
def f1(df):
df_means = pd.DataFrame(columns = ['Mean_Points'])
for columnname in df.columns:
if len(df[df[columnname] == 1]) > 1:
df_means.loc[columnname] = [df[df[columnname] == 1]['points'].mean()]
return df_means
So the output of f1 is
'Mean_Points'
A 52
C 41
D 80
and that's totally fine. But I am wondering if there is a possibility (I am sure there is) to obtain the same result with the apply method. I tried:
df_means = pd.DataFrame(columns = ['Mean_Points'])
cols = [col for col in df.columns if len(df[df[col] == 1]) > 1]
df_means.loc[cols] = df[cols].apply(lambda x: df[df[x] == 1]['points'].mean(), axis = 1)
or similar:
df_means = pd.DataFrame(columns = ['Mean_Points'])
df.columns.apply(lambda x: df_means.loc[x] = [df[df[x] == 1]['points'].mean()] if len(df[df[x] == 1]) > 1 else None)
and 2,3 other things, but nothing worked... I hope somebody can help me here?!
pd.DataFrame.dot
# filters s to be just those
# things greater than 1
# v
s = df.eq(1).sum().loc[lambda x: x > 1]
df.loc[:, s.index].T.dot(df.points).div(s)
A 52.0
C 41.0
D 80.0
dtype: float64
This removes the chaff but probably does more calculations than necessary.
df.T.dot(df.points).div(df.sum())[df.eq(1).sum().gt(1)]
A 52.0
C 41.0
D 80.0
dtype: float64