Search code examples
pandasdataframegroup-by

How to combine only specific rows in a dataframe


There is 1 Dataframe and 1 series

A dataframe (df1) consists of a model, its parts and quantities.

df1 = pd.DataFrame({'model':['A','A','A','B','B','C','C','C','D','D'],
                'part':['ax','ay','az','bx','by','cx','cy','cz','dx','dy'],
                'qty':[1,2,3,2,4,2,5,3,1,2]})

enter image description here

Series(s1) has information about whether parts for each model can be merged or not. "on" if merging is possible, "off" if not possible

s1 = pd.Series({'A':'on', 'B':'off','C':'off','D':'on'})

enter image description here

Leave the model with merge off, A model with merge on wants to combine the rows and display the sum in qty.

The picture below is what I want

enter image description here


Solution

  • Here is one very direct approach:

    1. join your "merge" indicator to your main dataframe
    2. subset the part where you want to groupby on the model
    3. do the groupby on the specified subset
    4. join back to the rest of the data that you did not want to merge

    Code below:

    import pandas as pd
    df1 = pd.DataFrame({'model':['A','A','A','B','B','C','C','C','D','D'],
                'part':['ax','ay','az','bx','by','cx','cy','cz','dx','dy'],
                'qty':[1,2,3,2,4,2,5,3,1,2]})
    s1 = pd.Series({'A':'on', 'B':'off','C':'off','D':'on'}, name='merge?')
    

    Part 1

    df2 = df1.merge(s1,left_on=['model'], right_index=True)
    

    Part 2

    to_merge_ind = df2.loc[:,'merge?'] == 'on'
    

    Part 3

    merged = df1[to_merge_ind].groupby('model')\
             .agg({'part': lambda x: '/'.join(x), 'qty': sum}).reset_index()
    

    Part 4

    final = merged.append(df1[~to_merge_ind])
    

    Output:

    enter image description here