Search code examples
pythonstringpandasdataframeconcatenation

Concatenate string in dataframe


I have a dataframe like this:

Time       User   Route
11:03:01   1234   home
11:03:04   1234   category
11:03:10   1234   product
11:03:21   1234   cart
11:04:01   4321   home
11:04:04   4321   category
11:04:10   4321   product
11:04:21   4321   cart

I want to create this:

Time       User   Route        Journey
11:03:01   1234   home         home
11:03:04   1234   category     home, category
11:03:10   1234   product      home, category, product
11:03:21   1234   cart         home, category, product, cart
11:04:01   4321   home         home
11:04:04   4321   category     home, category
11:04:10   4321   product      home, category, product
11:04:21   4321   cart         home, category, product, cart

How can I do this in a dataframe?


Solution

  • Here you go:

    df['Journey'] = (df.Route.add(', ')
                       .groupby(df['User'])
                       .transform(lambda x: x.cumsum().str[:-2])
                    )
    

    output:

           Time  User     Route                        Journey
    0  11:03:01  1234      home                           home
    1  11:03:04  1234  category                 home, category
    2  11:03:10  1234   product        home, category, product
    3  11:03:21  1234      cart  home, category, product, cart
    4  11:04:01  4321      home                           home
    5  11:04:04  4321  category                 home, category
    6  11:04:10  4321   product        home, category, product
    7  11:04:21  4321      cart  home, category, product, cart