Search code examples
pythonpandasdataframegroup-by

pandas groupby, transform and join


I have a dataframe and I would like to groupby and join the strings of a column together. So something like the below.

df = pd.DataFrame({
    'id': [1, 1, 2, 2, 3, 3], 
    'txt': ['sth', 'sth else', 'sth', 'one more thing', 'sth else', 'sth else'],
    'status': ['open', 'open', 'closed', 'open', 'open', 'open']})

df.assign(output=
    df.where(df.status=='open')
      .groupby(df.id)
      .txt.transform(lambda col: ', '.join(col.fillna(''))))

which gives me this

   id   txt             status  output
0   1   sth             open    sth, sth else
1   1   sth else        open    sth, sth else
2   2   sth             closed  , one more thing
3   2   one more thing  open    , one more thing
4   3   sth else        open    sth else, sth else
5   3   sth else        open    sth else, sth else

is there a way to

  1. not have duplicate values (like in row 4 and 5)
  2. not have a leading comma if a status is 'closed' (like in row 2 and 3) so that I would get
   id   txt             status  output
0   1   sth             open    sth, sth else
1   1   sth else        open    sth, sth else
2   2   sth             closed  one more thing
3   2   one more thing  open    one more thing
4   3   sth else        open    sth else
5   3   sth else        open    sth else

Solution

  • Instead of fillna, use dropna and combine with drop_duplicates:

    df.assign(output=
        df.where(df['status'].eq('open'))
          .groupby(df['id'])['txt']
          .transform(lambda col: ', '.join(col.dropna().drop_duplicates()))
             )
    

    Output:

       id             txt  status          output
    0   1             sth    open   sth, sth else
    1   1        sth else    open   sth, sth else
    2   2             sth  closed  one more thing
    3   2  one more thing    open  one more thing
    4   3        sth else    open        sth else
    5   3        sth else    open        sth else