Search code examples
pandasdataframeconcatenationdata-manipulation

rows concatenate in pandas dataframe -new version


I have following table.

import pandas as pd
# Define the input data
data = {
    'ID': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3],
    'count': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,1,2,2,1,1,1,1,2],
    'priority': [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,4,3,1,2,3,4,4],
    'item': ['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','D','C','A','B','C','D','D'],
    'c': ['XX','XX','XX','XX','YY-SS','YY','YY','YY','YY-SS','YY','YY','YY','XX','XX','XX','XX','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','TT-SS','ZZ','ZZ','ZZ','ZZ']
}

# Convert the input data to a Pandas DataFrame
df = pd.DataFrame(data)

enter image description here

I need to transform this input as you can see in below output example: enter image description here enter image description here

If you have any ideas please share. Thank you very much!


Solution

  • You can use a custom groupby.agg:

    out = (df
       .sort_values(by='priority') # optional
       .groupby(['ID', 'count'], as_index=False)
       .agg({'item': '-'.join, 'c': 'first'})
       .assign(FINAL=lambda d: d.pop('item')+'-'+d.pop('c'))
       .drop(columns='count')
    )
    

    Output:

       ID          FINAL
    0   1     A-B-C-D-XX
    1   1  A-B-C-D-YY-SS
    2   1  A-B-C-D-YY-SS
    3   1     A-B-C-D-XX
    4   2     A-B-C-D-ZZ
    5   2         D-C-ZZ
    6   3  A-B-C-D-TT-SS
    7   3           D-ZZ