How can I merge my lines with the same value except for one column into one line in my dataframe/table?

I habe a table with three columns, which contain a lot of the same values, but the lines differ in the second column. I want to create a dictionary (or a table) that has all the lines with the same values merged and the values from the second column as a list in the second column. So basically I want to turn this:

1	2	3
Value1	ValueA	ValueI
Value1	ValueB	ValueI

into this:

1	2	3
Value1	ValueA; ValueB	ValueI

I googled about pandas-functions, but I couldn't find anything that helps. I tried around with a lot of for-loops, but I couldn't get it two work and also I wonder if there isn't an easy way to do it?

Solution

There's a method groupby to group data by sets of values in provided columns. And the there's method agg, which applies aggregation function to each group. In your case - collecting data into a list.

Here's a simple snippet that creates mock data and does the job:

import pandas as pd

df = pd.DataFrame({
    "1": ["Value11", "Value11", "Value12", "Value12"],
    "2": ["Value21", "Value22", "Value23", "Value24"],
    "3": ["Value31", "Value31", "Value32", "Value32"],
})

df \
    .groupby(["1", "3"], as_index=False) \
    .agg(list)

Output:

         1        3                   2
0  Value11  Value31  [Value21, Value22]
1  Value12  Value32  [Value23, Value24]