I'm curious how I could convert a for loop that I've written into a function in R? I've no experience with writing my own functions in R. I looked here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post.
The for loop with reproducible data is here:
P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL
for(n in 1:Z)
{
Num[n] = abs(D$V1[n]-D$V2[n])
Denom[n] = max(D$V1[n], D$V2[n])
Diff[n] = Num[n]/Denom[n]
}
PV=mean(Diff)
PV
But, I'm interested in calculating PV based on levels such as in this data:
DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))
Therefore, my final code I would like to use would be:
ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION)
So, if I could turn the above for loop into a working function I could run the tapply function to get PV based on levels.
Any help would be appreciated or any other suggestions opposed to the one I offer.
Thanks!
Once you have your library loaded:
library(caTools)
Here's a function you can run on your data:
mymeandiff <- function(values){
df <- as.data.frame(combs(values, 2))
diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
mean(diff)
}
mymeandiff(1:50)
Then we can use dplyr
to run on each group (after correcting the data):
mydf$DATA <-as.numeric(as.character(mydf$DATA))
library(dplyr)
mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA))
For apply, rather than dplyr:
tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)
Let's time it:
microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
dplyr = mydf %>% group_by(NAME) %>%
summarise(mymeandiff(DATA)))
Unit: milliseconds
expr min lq mean median uq max neval
tapply 60.36543 61.08658 63.81995 62.61182 66.13671 80.37819 100
dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364 100
tapply
is slightly faster