Search code examples
pythonpandasdataframegroup-by

rows concatenate in pandas dataframe


I have following table.

enter image description here

I need to transform this input as you can see in below output example: enter image description here

import pandas as pd

# Define the input data
data = {
    'ID': [500, 500, 500, 500, 500, 500, 500, 500, 400, 400, 400, 400, 400, 300, 200],
    'item': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'A', 'B', 'A', 'C', 'E', 'D', 'E'],
    'Counter': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 1, 2, 1, 1, 1],
    'C': ['XX', 'XX', 'XX', 'XX', 'XX', 'YY', 'YY', 'YY', 'XX', 'XX', 'YY', 'YY', 'YY', 'XX', 'XX']
}

# Convert the input data to a Pandas DataFrame
df = pd.DataFrame(data)

If you have any ideas please share. Thank you very much!



Solution

  • Your I/O don't not match but (based on your input screenshot), you can try this :

    out = (df.groupby(["ID", df["priority"].eq(1).cumsum(), "C"], as_index=False, sort=False)
              ["item"].agg("-".join).assign(ideal= lambda x: x.pop("item").str.cat(x.pop("C"), sep="-"))
          )
    

    ​ Output :

    print(out)
    
        ID         ideal
    0  500  A-B-C-D-E-XX
    1  500    A-B-C-E-YY
    2  400        A-B-XX
    3  400        A-C-YY
    4  400          E-YY
    5  300          D-XX
    6  200          E-XX