How can I calculate session IDs from event data, assigning a new ID every time there was no event for 15 minutes?

I have a huge file with events tracked a website. The data contains, among others, the user_id and the time_stamp of events (clicked on a link, viewed an image, etc.). Here is a simplified example:

#%%
import pandas as pd

# make a dataframe
df = pd.DataFrame([['123', 1],
                   ['123', 1],
                   ['123', 19],
                   ['234', 7],
                   ['234', 28],
                   ['234', 29]],
                   columns=['user_id', 'time_stamp'])
print(df)

What I would like to obtain is a session_id column, which is counting the sessions for each user. (Alternatively a string with the user_id and the time_stamp concatenated, but I assume counting is simpler?) I want it to look somewhat like this:

# make a dataframe
df = pd.DataFrame([['123', 1,  0],
                   ['123', 1,  0],
                   ['123', 19, 1],
                   ['234', 7,  0],
                   ['234', 28, 1],
                   ['234', 29, 1]],
                   columns=['user_id', 'time_stamp', session_id])
print(df)

I read quite a lot, and tried even more, but I just can't figure out how to do it without a for loop. There is probably some .shift(1) involved and something with .groupby()? Any help is appreciated.

Solution

Try:

df['session_id'] = df.groupby('user_id')['time_stamp'].diff() >= 15
df['session_id'] = df.groupby('user_id')['session_id'].cumsum()

print(df)

Prints:

  user_id  time_stamp  session_id
0     123           1           0
1     123           1           0
2     123          19           1
3     234           7           0
4     234          28           1
5     234          29           1