I have following table.
import pandas as pd
# Define the input data
data = {
'ID': [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3],
'count': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,1,1,1,2,2,1,1,1,1,2],
'priority': [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,4,3,1,2,3,4,4],
'item': ['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','D','C','A','B','C','D','D'],
'c': ['XX','XX','XX','XX','YY-SS','YY','YY','YY','YY-SS','YY','YY','YY','XX','XX','XX','XX','ZZ','ZZ','ZZ','ZZ','ZZ','ZZ','TT-SS','ZZ','ZZ','ZZ','ZZ']
}
# Convert the input data to a Pandas DataFrame
df = pd.DataFrame(data)
I need to transform this input as you can see in below output example: enter image description here
If you have any ideas please share. Thank you very much!
You can use a custom groupby.agg
:
out = (df
.sort_values(by='priority') # optional
.groupby(['ID', 'count'], as_index=False)
.agg({'item': '-'.join, 'c': 'first'})
.assign(FINAL=lambda d: d.pop('item')+'-'+d.pop('c'))
.drop(columns='count')
)
Output:
ID FINAL
0 1 A-B-C-D-XX
1 1 A-B-C-D-YY-SS
2 1 A-B-C-D-YY-SS
3 1 A-B-C-D-XX
4 2 A-B-C-D-ZZ
5 2 D-C-ZZ
6 3 A-B-C-D-TT-SS
7 3 D-ZZ