Search code examples
rfunctionstatistics-bootstrap

How do I write a simple function that incorporates a function from an R package?


This runs fine when I specify everything, but just trying to generalize it a bit with "score" and "outcome" and it fails (see the end). Any idea how to do this? (I have the indices thing because I want to bootstrap this later)

library(PRROC)
df <- iris %>% filter(Species != "virginica") %>% mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>% select(Sepal.Length, outcome_versi)

#Iris single AUC
fc <- function(data, indices){
  d <- data[indices,]
  versi.y <- d %>% filter(outcome_versi == 1) %>% select(Sepal.Length)
  versi.n <- d %>% filter(outcome_versi == 0)%>% select(Sepal.Length)
  prroc.sepal.length <-pr.curve(scores.class0 = versi.y$Sepal.Length, scores.class1 = versi.n$Sepal.Length, curve=T)
  return(prroc.sepal.length$auc.integral)
}

fc(df)
#AUC = 0.94

#Iris single AUC - functionalized
fcf <- function(score, outcome, data, indices){
  d <- data[indices,]
  test.pos <- d %>% filter(outcome==1) %>% select(score)
  test.neg <- d %>% filter(outcome==0) %>% select(score)
  prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
  return(prroc.test$auc.integral)
}

fcf(data=df, score=Sepal.Length, outcome = outcome_versi)
#Error: 'outcome' not found```

Solution

  • As I mentioned yesterday, this is a standard NSE problem, which is [almost] always encountered when programming in the tidyverse. The problem is caused by the fact that tidyverse allows you to write, for example,

    iris %>% filter(Sepal.Length < 6)
    

    All other things being equal, at the time the function is called, the object Sepal.Length does not exist, but no error is thrown and the code works "as expected".

    Here's how I deal with this in your situation. Note that I have removed the condition parameter to the function, because I feel this is more naturally handled by a call to filter earlier in the pipe and I have moved data/d to be the first parameter of the function so that it fits more naturally into a pipe.

    Also, I don't have the PRROC package, so have commened out the call to it inside the function, and replaced the original return value accordingly. Simply make the obvious changes to get the functionality you need. The solution to the NSE issue does not depend on access to PRROC.

    library(magrittr)
    library(dplyr)
    
    fcf <- function(d, score=Sepal.Length, outcome = outcome_versi){
      qScore <- enquo(score)
      qOutcome <- enquo(outcome)
    
      test.pos <- d %>% filter(!! qOutcome == 1) %>% select(!! qScore)
      test.neg <- d %>% filter(!! qOutcome == 0) %>% select(!! qScore)
      # prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
      # return(prroc.test$auc.integral)
      return(list("pos"=test.pos, "neg"=test.neg))
    }
    
    # as_tibble simply to improve formatting
    as_tibble(iris) %>% 
      mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>% 
      fcf()
    
    $pos
    # A tibble: 50 × 1
       Sepal.Length
              <dbl>
     1          7  
     2          6.4
     3          6.9
     4          5.5
     5          6.5
     6          5.7
     7          6.3
     8          4.9
     9          6.6
    10          5.2
    # … with 40 more rows
    
    $neg
    # A tibble: 100 × 1
       Sepal.Length
              <dbl>
     1          5.1
     2          4.9
     3          4.7
     4          4.6
     5          5  
     6          5.4
     7          4.6
     8          5  
     9          4.4
    10          4.9
    # … with 90 more rows
    

    And similarly,

    set.seed(123)
    as_tibble(iris) %>% 
       mutate(
        outcome_versi = ifelse(Species == "versicolor", 1, 0),
        RandomOutcome=runif(nrow(.)) > 0.5
      ) %>% 
      filter(Sepal.Length < 6) %>% 
      fcf(score=Petal.Width, outcome=RandomOutcome)
    
    $pos
    # A tibble: 40 × 1
       Petal.Width
             <dbl>
     1         0.2
     2         0.2
     3         0.2
     4         0.3
     5         0.2
     6         0.2
     7         0.2
     8         0.1
     9         0.1
    10         0.4
    # … with 30 more rows
    
    $neg
    # A tibble: 43 × 1
       Petal.Width
             <dbl>
     1         0.2
     2         0.2
     3         0.4
     4         0.1
     5         0.2
     6         0.2
     7         0.4
     8         0.3
     9         0.3
    10         0.2
    # … with 33 more rows
    

    Finally, if you want to use an enquoted variable on the left hand side of an assignment, then you need to use := rather than =.