Search code examples
pythonpandaspandas-apply

Create a new dataframe by applying function to columns of another dataframe


I try to learn more about the apply method in python and asking myself how to write the following code using apply:

I have a dataframe df like the following:

  A B C D E points
0 0 0 0 1 43 94
1 0 0 1 1 55 62
2 1 1 0 1 21 84
3 1 0 1 0 13 20

Furthermore I have a function like the following, which does its job:

def f1(df):
  df_means = pd.DataFrame(columns = ['Mean_Points'])
  for columnname in df.columns:
    if len(df[df[columnname] == 1]) > 1:
      df_means.loc[columnname] = [df[df[columnname] == 1]['points'].mean()]
  return df_means

So the output of f1 is

  'Mean_Points'
A      52
C      41
D      80

and that's totally fine. But I am wondering if there is a possibility (I am sure there is) to obtain the same result with the apply method. I tried:

df_means = pd.DataFrame(columns = ['Mean_Points'])
cols = [col for col in df.columns if len(df[df[col] == 1]) > 1]
df_means.loc[cols] = df[cols].apply(lambda x: df[df[x] == 1]['points'].mean(), axis = 1)

or similar:

df_means = pd.DataFrame(columns = ['Mean_Points'])
df.columns.apply(lambda x: df_means.loc[x] = [df[df[x] == 1]['points'].mean()] if len(df[df[x] == 1]) > 1 else None)

and 2,3 other things, but nothing worked... I hope somebody can help me here?!


Solution

  • pd.DataFrame.dot

    #                      filters s to be just those
    #                      things greater than 1
    #                      v
    s = df.eq(1).sum().loc[lambda x: x > 1]
    df.loc[:, s.index].T.dot(df.points).div(s)
    
    A    52.0
    C    41.0
    D    80.0
    dtype: float64
    

    One liner approach

    This removes the chaff but probably does more calculations than necessary.

    df.T.dot(df.points).div(df.sum())[df.eq(1).sum().gt(1)]
    
    A    52.0
    C    41.0
    D    80.0
    dtype: float64