Let's say I want to calculate the mean (or a custom function) on column A based on different values in columns B-D. Here is the data:
input:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
output (note your rand numbers might result in different summary table):
col value mean
B 0 5.92
B 1 4.71
C 0 6
C 1 5.17
D 0 4.89
D 1 6
I can do it for each column separately:
data %>% group_by(B) %>% summarise(mean(A))
I put it in a for loop
:
p <- data.frame(NULL)
for(i in c('B','C','D')){
q <- data %>% group_by_(i) %>% summarise(col=i,mean = mean(A))
p <- append(p,q)
}
but it didnt work quite as expected. Any suggestions would be very helpful.
An option would be to gather
the data into 'long' format, grouped by 'key', 'val' columns, get the mean
of 'A'
library(tidyverse)
gather(data, key, val, B:D) %>%
group_by(key, val) %>%
summarise(A = mean(A))
Or in base R
, by unlist
ing the columnss from 'B' to 'D' and using the grouping column as 'A' with the replicated column names
aggregate(A ~ ., cbind(data['A'], cN = names(data)[-1][col(data[-1])],
group = unlist(data[-1])), mean)
set.seed(24)
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))