Search code examples
pythonpandasdataframegrouping

Grouping csv file based on column in pandas


I have a file with each row having specific data

enter image description here

expected output is

enter image description here

  • condition 1 : where ever source is 'a' need to rename its headers with prefix 'seed_'
  • condition 2 : can make use of bundle id for group

Any way is it doable from pandas ?


Solution

  • You can use boolean indexing to split your dataframe (group 'a' vs other) then use merge:

    m = df['source'] == 'a'
    out = df[m].drop(columns='source').merge(df[~m], on='bundle id', suffixes=('_seed', '_comp'))
    

    Output:

    >>> out
      name_seed  bundle id  price_seed  name_comp  price_comp source
    0    iphone        123         999  iphone 12         950      b
    1    iphone        123         999  iphone 13         975      c
    2     apple        345         100    Apple 1          99      c