Search code examples
rfunctionloopsapplylevels

Going from a for loop to a function in R


I'm curious how I could convert a for loop that I've written into a function in R? I've no experience with writing my own functions in R. I looked here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post.

The for loop with reproducible data is here:

P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL

for(n in 1:Z)
   {
    Num[n] = abs(D$V1[n]-D$V2[n])
    Denom[n] = max(D$V1[n], D$V2[n])
    Diff[n] = Num[n]/Denom[n]
    }
 PV=mean(Diff)
 PV

But, I'm interested in calculating PV based on levels such as in this data:

DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))

Therefore, my final code I would like to use would be:

ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION) 

So, if I could turn the above for loop into a working function I could run the tapply function to get PV based on levels.

Any help would be appreciated or any other suggestions opposed to the one I offer.

Thanks!


Solution

  • Once you have your library loaded:

    library(caTools)
    

    Here's a function you can run on your data:

    mymeandiff <- function(values){
        df <- as.data.frame(combs(values, 2))
        diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
        mean(diff)
    }
    mymeandiff(1:50)
    

    Then we can use dplyr to run on each group (after correcting the data):

    mydf$DATA <-as.numeric(as.character(mydf$DATA))
    
    library(dplyr)
    mydf %>% group_by(NAME) %>%
             summarise(mymeandiff(DATA))
    

    For apply, rather than dplyr:

    tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)
    

    Let's time it:

    microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
                                   dplyr = mydf %>% group_by(NAME) %>%
                                                    summarise(mymeandiff(DATA)))
    Unit: milliseconds
       expr      min       lq     mean   median       uq       max neval
     tapply 60.36543 61.08658 63.81995 62.61182 66.13671  80.37819   100
      dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364   100
    

    tapply is slightly faster