Search code examples
pythonpandasdataframenumpydatatable

How can I merge my lines with the same value except for one column into one line in my dataframe/table?


I habe a table with three columns, which contain a lot of the same values, but the lines differ in the second column. I want to create a dictionary (or a table) that has all the lines with the same values merged and the values from the second column as a list in the second column. So basically I want to turn this:

1 2 3
Value1 ValueA ValueI
Value1 ValueB ValueI

into this:

1 2 3
Value1 ValueA; ValueB ValueI

I googled about pandas-functions, but I couldn't find anything that helps. I tried around with a lot of for-loops, but I couldn't get it two work and also I wonder if there isn't an easy way to do it?


Solution

  • There's a method groupby to group data by sets of values in provided columns. And the there's method agg, which applies aggregation function to each group. In your case - collecting data into a list.

    Here's a simple snippet that creates mock data and does the job:

    import pandas as pd
    
    df = pd.DataFrame({
        "1": ["Value11", "Value11", "Value12", "Value12"],
        "2": ["Value21", "Value22", "Value23", "Value24"],
        "3": ["Value31", "Value31", "Value32", "Value32"],
    })
    
    df \
        .groupby(["1", "3"], as_index=False) \
        .agg(list)
    

    Output:

             1        3                   2
    0  Value11  Value31  [Value21, Value22]
    1  Value12  Value32  [Value23, Value24]