Search code examples
pythonpandasdataframepandas-groupbymelt

Python: Add aggregated columns to DataFrame based on the Key and additional conditions


I have 2 dataframes in the following view: dogs dataframe is:

DogID   PuppyName1  PuppyName2  PuppyName3  PuppyName4   DogWeight
Dog1       Nick         NaN         NaN      NaN            12.7
Dog2       Jack         Fox         Rex      NaN            15.5         
Dog3       Snack        NaN         NaN      NaN            10.2
Dog4       Yosee        Petty       NaN      NaN            16.9

puppyWeights dataframe is:

PuppyName   Jan17  Jun18  Dec18  April19   
Nick         0.8    1.7     3.7     4.6
Jack         0.6    1.3     2.8     3.5        
Fox          0.9    1.7     3.4     4.3
Rex          1.0    2.3     3.0     4.2
Snack        0.8    1.7     2.8     4.4
Yosee        0.6    1.2     3.1     4.3
Petty        0.5    1.3     2.8     3.5 

I need to add information about puppies weight by Months to the Dogs dataframe based on PuppyWeights dataframe. If Dog has more than 1 child, for instance: Dog2, Dog3-> I need to take mean of the weight values by PuppyName for each Month. For example: Dog2 should be average between values for Jack and Foxin PuppyWeights table:

DogID   Jan17  Jun18  Dec18  April19   DogWeight
Dog2     0.75  1.5     3.1     3.9        15.5     

I tried to use melt function to convert ['PuppyName1', 'PuppyName2', 'PuppyName3', 'PuppyName4'] columns into rows.

But, I don't have idea how to add months information to dogs dataframe with aggregation by mean when dog has more than one child.

df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName')

Desired Output is:

DogID   Jan17  Jun18  Dec18  April19   DogWeight
Dog1     0.8    1.7    3.7     4.6        12.7     
Dog2     0.75   1.5    3.1     3.9        15.5   
Dog3     0.8    1.7    2.8     4.4        10.2
Dog4     0.55   1.25   2.95    3.9        16.9  

How I can add weight information by Month to the dogs dataframe?

I'd appreciate for any idea. Thanks)


Solution

  • Here is one way melt the dogs , then merge and groupby

    df2 = dogs.melt(id_vars=['DogID','DogWeight'], var_name="Puppies", value_name='PuppyName').dropna()
    
    df2.merge(df,on='PuppyName',how='left').groupby('DogID').mean()
    Out[423]: 
           DogWeight     Jan17     Jun18     Dec18  April19
    DogID                                                  
    Dog1        12.7  0.800000  1.700000  3.700000      4.6
    Dog2        15.5  0.833333  1.766667  3.066667      4.0
    Dog3        10.2  0.800000  1.700000  2.800000      4.4
    Dog4        16.9  0.550000  1.250000  2.950000      3.9