I habe a table with three columns, which contain a lot of the same values, but the lines differ in the second column. I want to create a dictionary (or a table) that has all the lines with the same values merged and the values from the second column as a list in the second column. So basically I want to turn this:
1 | 2 | 3 |
---|---|---|
Value1 | ValueA | ValueI |
Value1 | ValueB | ValueI |
into this:
1 | 2 | 3 |
---|---|---|
Value1 | ValueA; ValueB | ValueI |
I googled about pandas-functions, but I couldn't find anything that helps. I tried around with a lot of for-loops, but I couldn't get it two work and also I wonder if there isn't an easy way to do it?
There's a method groupby
to group data by sets of values in provided columns. And the there's method agg
, which applies aggregation function to each group. In your case - collecting data into a list.
Here's a simple snippet that creates mock data and does the job:
import pandas as pd
df = pd.DataFrame({
"1": ["Value11", "Value11", "Value12", "Value12"],
"2": ["Value21", "Value22", "Value23", "Value24"],
"3": ["Value31", "Value31", "Value32", "Value32"],
})
df \
.groupby(["1", "3"], as_index=False) \
.agg(list)
Output:
1 3 2
0 Value11 Value31 [Value21, Value22]
1 Value12 Value32 [Value23, Value24]