Search code examples
pandasdataframeconcatenation

Conditionally concatenate rows of a dataframe and process additional columns based on the condition


I have an Input Dataframe that the following :

NAME    TEXT                                            START   END
Tim     Tim Wagner is a teacher.                        10      20.5
Tim     He is from Cleveland, Ohio.                     20.5    40
Frank   Frank is a musician.                            40      50
Tim     He like to travel with his family               50      62
Frank   He is a performing artist who plays the cello.  62      70
Frank   He performed at the Carnegie Hall last year.    70      85
Frank   It was fantastic listening to him.              85      90

Want output dataframe as follows:

NAME    TEXT                                                                                                                            START       END
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.                                                                           10          40  
Frank   Frank is a musician                                                                                                             40          50
Tim     He like to travel with his family                                                                                               50          62
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.  62          90

Appreciate your help on this.

Thanks


Solution

  • Try:

    grp = (df['NAME'] != df['NAME'].shift()).cumsum().rename('group')
    df.groupby(['NAME', grp], sort=False)['TEXT','START','END']\
      .agg({'TEXT':lambda x: ' '.join(x), 'START': 'min', 'END':'max'})\
      .reset_index().drop('group', axis=1)
    

    Output:

        NAME                                               TEXT  START   END
    0    Tim  Tim Wagner is a teacher. He is from Cleveland,...   10.0  40.0
    1  Frank                               Frank is a musician.   40.0  50.0
    2    Tim                  He like to travel with his family   50.0  62.0
    3  Frank  He is a performing artist who plays the cello....   62.0  90.0