Search code examples
rdataframelapplysapply

Custom function returning incorrect value - example (conflicting vector names)


For some reason, I can't get the same value when applying a custom function via:

MSD_p<-function(values,letters){
                  a<-abs(apply(combn(values,2), 2, diff))
                  b<-combn(letters,2)
                  c<-data.frame(t(rbind(a,b)))
                  c$a<-as.numeric(c$a)
                  c<-c[order(c$a),]
                  c$unique <- !sapply(gsub(" ", "", paste(c$V2, c$V3)), function(x) any(str_count(x, letters)>1))
                  m_1<-mean(c(min(c[c$unique==TRUE,]$a),max(c[c$unique==FALSE,]$a)))
                  c$unique_lag_3 <- ifelse(lag(c$unique) != c$unique & 
                                           lag(lag(c$unique)) != c$unique &
                                           lead(c$unique) == c$unique , "New", "Same")
                  rows <- lapply(which(c$unique_lag_3=="New"), function(x) (x-1):(x))
                  m_2<-mean(c[unlist(rows),]$a)
                  m_2<-as.numeric(m_2)
                  return(m_2)
}

MSD_p(dt3$values,dt3$letters)

This results in 8.5.

versus

                  a<-abs(apply(combn(dt3$values,2), 2, diff))
                  b<-combn(dt3$letters,2)
                  c<-data.frame(t(rbind(a,b)))
                  c$a<-as.numeric(c$a)
                  c<-c[order(c$a),]
                  c$unique <- !sapply(gsub(" ", "", paste(c$V2, c$V3)), function(x) any(str_count(x, letters)>1))
                  m_1<-mean(c(min(c[c$unique==TRUE,]$a),max(c[c$unique==FALSE,]$a)))
                  c$unique_lag_3 <- ifelse(lag(c$unique) != c$unique & 
                                           lag(lag(c$unique)) != c$unique &
                                           lead(c$unique) == c$unique , "New", "Same")
                  rows <- lapply(which(c$unique_lag_3=="New"), function(x) (x-1):(x))
                  m_2<-mean(c[unlist(rows),]$a)
                  m_2<-as.numeric(m_2);m_2

This results in 9.75.

How is this possible? Data:

dt3<-data.frame(structure(list(trial_number = c(20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L), values = c(74.7, 81.1, 80.1, 90.1, 98.9, 96.1, 93.5, 95, 
99.6, 93.3, 96.7, 92.7, 94.7, 92.1, 100.3, 97.4, 94.1, 97.3, 
97.1, 93.1), letters = c("g", "cd", "d", "bc", "ab", "ab", "ab", 
"ab", "ab", "ab", "ab", "ab", "ab", "ab", "a", "ab", "ab", "ab", 
"ab", "ab")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame")))

Appreciate the help.


Solution

  • In the outside function code, OP uses letters, which is a built-in vector in R and it is not coming from the 'dt3' column 'letters'

    c$unique <- !sapply(gsub(" ", "", paste(c$V2, c$V3)), 
           function(x) any(str_count(x, letters)>1))
    

    It should be dt3$letters

    Along with the fact that creating objects names with function names (c) or letters (built-in vectors) can cause buggy situations


    As the OP was naming the built-in structure in the code, an option is to use dt3$letters in place of letters or just for reproducing the output, change the line above to

    c$unique <- !sapply(gsub(" ", "", paste(c$V2, c$V3)), 
           function(x) any(str_count(x, dt3$letters)>1))
    

    Running the code gives

    m_2
    [1] 8.5