Is there a neat way to aggregate columns into a new column without duplicating information?
For example, if I have a df:
Description Information
0 text1 text1
1 text2 text3
2 text4 text5
And I want to create a new column called 'Combined', which aggregates 'Description' and 'Information' to get:
Description Information Combined
0 text1 text1 text1
1 text2 text3 text2 text3
2 text4 text5 text4 text5
So far I have been using np.where and [mask] to check for duplicates before aggregating with df['Combined'] = df[['Description', 'Information']].agg(' '.join, axis=1)
Although this works, it is not practical on a larger scale, grateful if anyone knows of a simpler way!
You can first run unique
:
df['Combined'] = (df[['Description', 'Information']]
.agg(lambda x: ' '.join(x.unique()), axis=1)
)
Output:
Description Information Combined
0 text1 text1 text1
1 text2 text3 text2 text3
2 text4 text5 text4 text5