Search code examples
pythonpandasjoin

For a list of users and activity dates, how to find the earliest date?


I have a list of users and their activities, like this:

user date activity
Tom 1/1/21 Hop
Dick 1/2/21 Skip
Harry 2/2/21 Jump
Tom 1/3/21 Skip
Dick 1/4/21 Jump

I want to extract unique user names and the earliest activity date, to get a result like this:

user first activity
Tom 1/1/21
Dick 1/2/21
Harry 2/2/21

I know I can create an array of unique usernames like this:

unique_users = user_actions[user].unique()

But I don't know how to turn that array of unique usernames into a dataframe with the first action date.


Solution

  • To get the desired result, you can do the following:

    • Convert the date column to DateTime format for easy sorting and filtering
    • Group by user and find the minimum date for each user
    df['date'] = pd.to_datetime(df['date'])
    result = (
        df.groupby("user")["date"]
        .agg(first_activity="min")
        .reset_index()
        .sort_values("first_activity")
    )
    print(result)
    

    Output:

        user first_activity
    2    Tom     2021-01-01
    0   Dick     2021-01-02
    1  Harry     2021-02-02