I have a table with users and times of their actions:
user_ id | time | user_action |
---|---|---|
user_1 | 1 | action_1 |
user_2 | 2 | action_2 |
user_1 | 3 | action_3 |
user_2 | 4 | action_4 |
My algorithm includes looping over unique values and taking some actions using the shift function.
users = df.user_id.unique()
df_2 = pd.DataFrame(columns = ...) # empty DataFrame
for user in users:
df_new = df[df.user_id == user]
df_new['...'] = df['user_action'].shift(1)....
\ some actions using shift \
df_2 = pd.concat([df_2, df_new])
Also, cumsums are used, so it would be impossible to implement algorithm without dividing into "df_new" pieces.
This method is too long seemingly because of using Python's "for". How can I do it using the build-in Pandas functions?
Without seeing the full code, it's hard to tell exactly what you're trying to achieve.
But have you looked at DataFrame.groupby()
? This returns a DataFrameGroupBy
object which also offers shift()
and cumsum()
methods.
Thus, you may be able to rewrite your code using groupby()
:
df.groupby(by="user_id").shift(1).cumsum()
Another option could be DataFrameGroupby.apply()
, transform()
or aggregate()
depending on your expected output format, applying your original logic to each group:
def your_actions(df: pd.DataFrame) -> pd.DataFrame:
... # whatever you did for each user
df.groupby(by="user_id").apply(your_actions)
You may get more precise answers if you provide a more complete code sample or a description what you are trying to compute using shift and cumsum, but I hope this already helps by hinting into a possible direction.