Search code examples
dataframejulia

How can I count rows by group while defining count column name?


I am trying to count the number of rows by group in a DataFrame. The following code generates a new column, called x1, which which has the intended information:

by(df, [:grouping_var_1, :grouping_var_2], nrow) 

However, I am not aware on how to generate such column in a way I can define a name other than x1. The solution I have found so far is:

@pipe df |> by(_, [:grouping_var_1, :grouping_var_2], nrow) |> rename(_, :x1 => :my_desired_name);

Is there anyway I could do this directly without having to use rename ?

Thanks in advance.


Solution

  • Please update DataFrames.jl to 0.21 version.

    Then use:

    combine(groupby(df, [:grouping_var_1, :grouping_var_2]), nrow => :my_desired_name)
    

    Two comments:

    • by is deprecated and you are recommended not to use it (you can see the warning if you start Julia with --depwarn=true)
    • A general pattern for writing transofrmations is source_columns => function => target_column_name, you can use a shorthand source_columns => function, in which case the name of the target column is generated automatically. A special case is nrow (without anything) and nrow => target_column_name, as for nrow you do not have to pass the source columns for convenience