Search code examples
pythonpandascsvjoinrow

Join rows and concatenate attribute values in a csv file with pandas


I have a csv file structured like this:

enter image description here

As you can see, many lines are repeated (they represent the same entity) with the attribute 'category' being the only difference between each other. I would like to join those rows and include all the categories in a single value.

For example the attribute 'category' for Walmart should be: "Retail, Dowjones, SuperMarketChains".

Edit:

I would like the output table to be structured like this:

enter image description here

Edit 2:

What worked for me was:

df4.groupby(["ID azienda","Name","Company code", "Marketcap", "Share price", "Earnings", "Revenue", "Shares", "Employees"]
)['Category'].agg(list).reset_index()

Solution

  • Quick and Dirty

    df2=df.groupby("Name")['Category'].apply(','.join)
    
    subst=dict(df2)
    df['category']=df['Name'].replace(subst)
    df.drop_duplicates('Name')
    
    

    if you prefer multiple categories to be stored as a list in pandas column category... change first line to

    df2=df.groupby("Name")['Category'].apply(list)