Search code examples
rloopsdplyrnse

Using dplyr within a loop to summarise several data.frame variables


I want to summarise several columns from a data.frame. The grouping and summary was achieved with dplyr, as in the example below.

df = data.frame (time = rep(c("day", "night"), 10) , 
    who =rep(c("Paul", "Simon"), each=10) , 
    var1 = runif(20, 5, 15), var2 = runif(20, 10, 12), var3 = runif(20, 2, 7), var4 = runif(20, 1, 3)) 

Writting the function I need

quantil_x = function (var, num) { quantile(var, num, na.rm=T) }

Using it at var1 and exporting

percentiles = df %>% group_by(time, who) %>% summarise(
    P0 = quantil_x (var1, 0),
    P25 = quantil_x (var1, .25),
    P75 = quantil_x (var1, .75)
    )
write.table(percentiles, file = "summary_var1.csv",row.names=FALSE, dec=",",sep=";")

What I want is to repeat this same task for 'var2', 'var3' and 'var4'. I have tried to run a loop with no success to perform this task multiple times. Unfortunately I couldn't find a way to handle distinct calls of variables within the code. That is, within the loop I have tried to use summarise_(), tried to use get() inside the fuction quantil_x() or within summarise, also as.name but none of this worked.

I'm pretty sure this is a bad coding skill issue, but that's all I came up with so far. Here is an example of what I tried to do:

list = c("var1", "var2", "var3", "var4")
for (i in list){
percentiles = df %>% group_by(time, who) %>% summarise(
    P0 = quantil_x (get(i), 0),
    P25 = quantil_x (get(i), .25),
    P75 = quantil_x (get(i), .75)
    )
write.table(percentiles, file = paste0("summary_",i,".csv",row.names=FALSE, dec=",",sep=";")
}

I read this post, but didn't help much on my case.

Thanks in advance.


Solution

  • You can do this with gather()

    percentiles = df %>%
    gather(Var,Value,var1,var2,var3) %>%
     group_by(Var,time, who) %>%
     summarise(
        P0 = quantil_x (Value, 0),
        P25 = quantil_x (Value, .25),
        P75 = quantil_x (Value, .75)
        )