I have datasets that involve large number of column joins (8-12) and at the same time depending upon the circumstance 1-3 of these columns may not be needed.
Presently I have been writing out these long group-bys using dplyr but with so many columns and changing situations, it is easy to misspell or forget a column.
I'd like to somehow create a variable that goes along with doing this, but I haven't been able to figure out how to due to the quotes that are present when I try to use paste. Can anyone show me a quick example of how to do this?
For example:
library(dplyr)
# I want this group-list not to have quotes so I can drop in my group_by below
my_group_list = paste0("vs"," ","am") #quotes get in the way
mtcars %>% group_by(my_group_list) %>% summarise(countofvalues = n())
If there are many columns, we can specify the columns to group from directly subsetting the column names. In that case, use group_by_
library(dplyr)
mtcars %>%
group_by_(.dots=names(.)[8:9]) %>%
summarise(countofvalues = n())
# vs am countofvalues
# (dbl) (dbl) (int)
#1 0 0 12
#2 0 1 6
#3 1 0 7
#4 1 1 7
The above also works if we have a vector
of values
my_group_list <- c("vs", "am")
mtcars %>%
group_by_(.dots = my_group_list) %>%
summarise(countofvalues = n())
# vs am countofvalues
# (dbl) (dbl) (int)
#1 0 0 12
#2 0 1 6
#3 1 0 7
#4 1 1 7
As the OP mentioned that it is not doing the grouping, we can test it by unite
ing the 'vs' and 'am' columns, use it as grouping variable and then do the n()
.
library(tidyr)
mtcars %>%
unite(vs_am, vs, am) %>%
group_by(vs_am) %>%
summarise(countofvalues = n())
# vs_am countofvalues
# (chr) (int)
#1 0_0 12
#2 0_1 6
#3 1_0 7
#4 1_1 7