How to avoid loop in Pandas iterating unique values?

I have a table with users and times of their actions:

user_ id	time	user_action
user_1	1	action_1
user_2	2	action_2
user_1	3	action_3
user_2	4	action_4

My algorithm includes looping over unique values and taking some actions using the shift function.


users = df.user_id.unique()

df_2 = pd.DataFrame(columns = ...) # empty DataFrame

for user in users:
    df_new = df[df.user_id == user]
    df_new['...'] = df['user_action'].shift(1)....
    \ some actions using shift \

    df_2 = pd.concat([df_2, df_new])

Also, cumsums are used, so it would be impossible to implement algorithm without dividing into "df_new" pieces.

This method is too long seemingly because of using Python's "for". How can I do it using the build-in Pandas functions?

Solution

Without seeing the full code, it's hard to tell exactly what you're trying to achieve.

But have you looked at DataFrame.groupby()? This returns a DataFrameGroupBy object which also offers shift() and cumsum() methods.

Thus, you may be able to rewrite your code using groupby():

df.groupby(by="user_id").shift(1).cumsum()

Another option could be DataFrameGroupby.apply(), transform() or aggregate() depending on your expected output format, applying your original logic to each group:

def your_actions(df: pd.DataFrame) -> pd.DataFrame:
  ...  # whatever you did for each user

df.groupby(by="user_id").apply(your_actions)

You may get more precise answers if you provide a more complete code sample or a description what you are trying to compute using shift and cumsum, but I hope this already helps by hinting into a possible direction.