Search code examples
python-3.xpandascsvpandas-groupbydata-analysis

How to Groupby based on multiple field and display all columns


I am trying to create a csv file where if few columns are same then i will merge row with similar value into one row .

eg:

Input :

Party_No install_date  Start_date  End_date  Product_family  Version City state 
 111     24-05-2018    25-05-2019 21-03-2020  storage           1     LA   USA
 111     24-05-2018    25-05-2019 21-03-2020  storage           1     KA   USA
 111     24-05-2018    25-05-2019 21-03-2020   storage          2     PA   UK

Output

Party_No install_date  Start_date  End_date  Product_family Version City   state 
111     24-05-2018    25-05-2019 21-03-2020  storage         1,2 LA,KA,PA UK,USA

ex : in my case

if value of party_number , item_install_date ,Contract_subline_date , Contract_Subline_end_date , Instance_family

i will merger row with same value into one row . other column apart from above mentioned will have comma separated value

Input CSV file link

Expected output CSV link

Code i tried:

import pandas as pd
import np

  df = None
  df = pd.read_csv("Export.csv")
  df.fillna(0,inplace=True)



pf=df.groupby(['PARTY_NUMBER','ITEM_INSTALL_DATE','CONTRACT_SUBLINE_START_DATE','CONTRACT_SUBLINE_END_DATE','INSTANCE_PRODUCT_FAMILY']).agg([','.join])

pf.to_csv("result1.csv", index=False)

Solution

  • Adding the unqiue (or set when order is not important)

    df.groupby(['...']).agg(lambda x : ','.join(x.unique())) # set(x)