Search code examples
rdplyrsummarization

Group_by / summarize by two variables within a function


I would like to write a function that summarize the provided data by some specified criteria, in this case by age

The example data is a table of users' age and their stats.

df <- data.frame('Age'=rep(18:25,2), 'X1'=10:17, 'X2'=28:35,'X4'=22:29)

Next I define the output columns that are relevant for the analysis

output_columns <- c('Age', 'X1', 'X2', 'X3')

This function computes the basic the sum of X1. X2 and X3 grouped by age.

aggr <- function(data, criteria, output_columns){
  k <- data %>% .[, colnames(.) %in% output_columns] %>%
    group_by_(.dots = criteria) %>%
    #summarise_each(funs(count), age) %>%
    summarize_if(is.numeric, sum)
  return (k)
}

When I call it like this

> e <- aggr(df, "Age", output_columns)
> e
# A tibble: 8 x 3
    Age    X1    X2
  <int> <int> <int>
1    18    20    56
2    19    22    58
3    20    24    60
4    21    26    62
5    22    28    64
6    23    30    66
7    24    32    68
8    25    34    70

I want to have another column called count which shows the number of observations in each age group. Desired output is

> desired
  Age X1 X2 count
1  18 20 56     2
2  19 22 58     2
3  20 24 60     2
4  21 26 62     2
5  22 28 64     2
6  23 30 66     2
7  24 32 68     2
8  25 34 70     2

I have tried different ways to do that, e.g. tally(), summarize_each etc. They all deliver wrong results.

I believe their should be an easy and simple way to do that. Any help is appreciated.


Solution

  • Since you're already summing all variables, you can just add a column of all 1s before the summary function

    aggr <- function(data, criteria, output_columns){ 
        data %>% 
          .[, colnames(.) %in% output_columns] %>%
          group_by_(.dots = criteria) %>%
          mutate(n = 1L) %>%
          summarize_if(is.numeric, sum)
    }
    
    # A tibble: 8 x 4
        Age    X1    X2     n
      <int> <int> <int> <int>
    1    18    20    56     2
    2    19    22    58     2
    3    20    24    60     2
    4    21    26    62     2
    5    22    28    64     2
    6    23    30    66     2
    7    24    32    68     2
    8    25    34    70     2