I am trying to turn the following code, which works properly, into a function.
result_check <- data %>%
group_by(column, target) %>%
summarise(Unique_Elements = n()) %>%
dcast(column_code ~ target, value.var="Unique_Elements")
For example, if we take the following dataset:
column1 target
AA YES
BB NO
BC NO
AA YES
The code would do the aggregate the dataset as per the target variable, like this:
column1 YES NO
AA 2 0
BB 0 1
BC 0 1
This is how I construct the function:
aggregate_per_group <- function(column) {
data %>%
group_by(column, target) %>%
summarise(Unique_Elements = n()) %>%
dcast(column ~ target, value.var="Unique_Elements")}
But I get - Error: unknown variable to group by : column. I know its a basic question, but any clues why I am loosing the argument in the group_by?
I have tried using the following imlementation "group_by_", as well as "require("dplyr")", but they seem unrelated.
We can use table
from base R
table(data)
If we are interested in a function, then use the group_by_
along with spread
from tidyr
aggregate_per_group <- function(column) {
data %>%
group_by_(column, "target") %>%
summarise(Unique_Elements = n()) %>%
spread(target, Unique_Elements, fill = 0)
}
library(dplyr)
library(tidyr)
aggregate_per_group("column1")
# column1 NO YES
# * <chr> <dbl> <dbl>
#1 AA 0 2
#2 BB 1 0
#3 BC 1 0
If we need the dcast
from reshape2
library(reshape2)
aggregate_per_group <- function(column) {
data %>%
group_by_(column, "target") %>%
summarise(Unique_Elements = n()) %>%
dcast(data = ., paste(column, '~ target'),
value.var="Unique_Elements", fill = 0)
}
aggregate_per_group("column1")
# column1 NO YES
#1 AA 0 2
#2 BB 1 0
#3 BC 1 0