Imagine I have a df on which I need to perform operations based on grouped columns. But I need two perform actions based on two groupings.
Having cols A, B, C
I need to do operation x
to df grouped by A, B
and operation y
to df grouped only by B
. Do I need to group the dataframe twice?
df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])
@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)
# I group here by A and B
gd = groupby(df, [:A, :B])
#=
My operations with df grouped by A and B.
... But now I need to perform only with B
=#
How to remove key A?
gd.removegroup([:A])
gd.removekey([:A])
gd.ungroup([:A])
df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])
@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)
# I group here by B
gd = groupby(df, [:B])
#=
My operations with df grouped by B.
... But now I need to perform with B and A
=#
How to add key A?
groupby(gd, [:A]) ❌❌❌❌
gd.addkey([:A])
gd.addgroup([:A])
Do I need to group the dataframe twice?
Yes. Grouping twice is the same amount of coding as adding/removing group. Just do e.g.:
gd1 = groupby(df, [:A, :B])
gd2 = groupby(df, :B)
Since grouped data frame is a view of source df
, if you mutate gd1
the changes will be reflected in gd2
automatically.
The only thing you need to keep in mind that when mutating df
you should not mutate columns :A
and :B
as mutating grouping columns could invalidate groupings.