Search code examples
pythonpandassortingpandas-groupbystring-concatenation

Concatenating strings after grouping by name and then sorting by date


I have this data in a data frame


data = [
           {'name' : 'a', 'date' : '2020-01-02', 'message' : 'there'},
           {'name' : 'b', 'date' : '2020-01-01', 'message' : 'Hello'},
           {'name' : 'a', 'date' : '2020-01-01', 'message' : 'Hi'},
           {'name' : 'b', 'date' : '2020-01-03', 'message' : 'everyone'},
           {'name' : 'c', 'date' : '2020-01-05', 'message' : 'Test'}
       ]

What I would like to do is group by name, then sort by date, and concatenate the message for each name so that the data looks like this

[
   {'name' : 'a', 'message' : 'Hi there'},
   {'name' : 'b', 'message' : 'Hello everyone'},
   {'name' : 'c', 'message' : 'Test'}
]

I have already been able to group by name and sort by date (after making the string into a datetime object) using this

df.groupby(['name']).apply(lambda x: x.sort_values(['date'])

but I am not sure how you would concatenate the strings together once you have grouped and sorted the data.


Solution

  • Try apply with join

    df.sort_values('date').groupby('name')['message'].apply(' '.join).reset_index()
    
      name         message
    0    a        Hi there
    1    b  Hello everyone
    2    c            Test