Search code examples
dataframejuliagrouping

Julia: Remove or add key from GroupedDataFrame


Imagine I have a df on which I need to perform operations based on grouped columns. But I need two perform actions based on two groupings.

Having cols A, B, C I need to do operation x to df grouped by A, B and operation y to df grouped only by B. Do I need to group the dataframe twice?

Case 1

df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by A and B
gd = groupby(df, [:A, :B])

#=
    My operations with df grouped by A and B.
    ... But now I need to perform only with B
=#

How to remove key A?

gd.removegroup([:A])
gd.removekey([:A])
gd.ungroup([:A])

Case 2

df=DataFrame(rand(160,3), :auto)
rename!(df,[:A,:B,:Z])

@. df.B = ifelse(rand() < 0.5, 1, 2)
@. df.A = ifelse(rand() < 0.5, 1, 2)

# I group here by B
gd = groupby(df, [:B])

#=
    My operations with df grouped by B.
    ... But now I need to perform with B and A
=#

How to add key A?

groupby(gd, [:A]) ❌❌❌❌
gd.addkey([:A])
gd.addgroup([:A])

Solution

  • Do I need to group the dataframe twice?

    Yes. Grouping twice is the same amount of coding as adding/removing group. Just do e.g.:

    gd1 = groupby(df, [:A, :B])
    gd2 = groupby(df, :B)
    

    Since grouped data frame is a view of source df, if you mutate gd1 the changes will be reflected in gd2 automatically.

    The only thing you need to keep in mind that when mutating df you should not mutate columns :A and :B as mutating grouping columns could invalidate groupings.