Search code examples
pythonpandasgroup-byaggregation

Aggregating a String Field as a Record Indepdent of other Aggregations in GroupBy


I have data that looks like df1

Store owner
Store1 A
Store1 B
Store2 B
Store2 C

and df2

Store Sold_Item Price
Store1 Apple 2
Store1 Orange 3
Store1 Apple 2
Store2 Apple 3
Store Banana 2

I would like to get something like

Store Owners Num_Sales Revenue
Store1 A 3 7
B
Store2 B 2 5
C

I have tried something like:

df = df1.merge(df2, how='inner', on='Store')
df_stats = df.groupby(['Store']).agg({"Sold_Item": np.count, "Price": 'sum'})

but I cannot what to do with the Owner field.

Any advice?


Solution

  • Shouldn't the last store of df2 be "Store2 ?

    out = (
        df1.merge(df2, on="Store", how="inner")
            .groupby(["Store", "owner"], as_index=False)
            .agg({"Sold_Item": "count", "Price": "sum"})  
    )
    
    out.loc[out["Store"].duplicated(), out.columns.difference(["owner"])] = None
    
    # out.to_excel("file.xlsx", index=False) # or `to_csv`
    

    Output (in Excel) :

    Store owner Sold_Item Price
    Store1 A 3 7
    B
    Store2 B 2 5
    C