Search code examples
pythonpandasdataframedata-analysis

Grouping all rows of a pandas DataFrame(with many columns) with the same value in a given column


I have been searching for hours.I have a DataFrame like so :-

     col1.  col2.   col3.   col4
row1.  a.    p       u       0
row2.  b.    q       v       1
row3.  a.    r       w       2
row4.  d.    s       x       3
row5.  b.    t       y       4

Now I want to group all this rows by the value of 'col1' so that I get :-

     col1.  col2.   col3.   col4
row1.  a.    p r     u w    0,2
row2.  b.    q t     v y    1,4
row3.  d.    s       x       3

Now I found a way where df.groupby('col1)['col2'].apply(' '.join()) would group all rows in 'col2' by the same value of 'col1'.But I am unable to extend the above command such that all rows of all columns are grouped together to get the output mentioned earlier.


The above DataFrame is just for illustration.The actual DataFrame includes around 100 rows and columns and all cells store feedbacks except for col1 which stores the name of the item for which the feedback is on.I want to group all columns on the basis of the same items(col1) and then I will be performing sentimental analysis on the DataFrame.


Solution

  • You can use:

    df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
    print (df1)
      col1 col2 col3 col4
    0   a.  p,r  u,w  0,2
    1   b.  q,t  v,y  1,4
    2   d.    s    x    3
    

    If need also indices:

    df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
    df1.index = df.drop_duplicates('col1').index
    print (df1)
          col1 col2 col3 col4
    row1.   a.  p,r  u,w  0,2
    row2.   b.  q,t  v,y  1,4
    row4.   d.    s    x    3
    

    Explanation:

    1. First cast all columns to strings by astype
    2. Then groupby and aggregate join by agg
    3. If need also indices by first values on col1 add drop_duplicates