This runs fine when I specify everything, but just trying to generalize it a bit with "score" and "outcome" and it fails (see the end). Any idea how to do this? (I have the indices thing because I want to bootstrap this later)
library(PRROC)
df <- iris %>% filter(Species != "virginica") %>% mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>% select(Sepal.Length, outcome_versi)
#Iris single AUC
fc <- function(data, indices){
d <- data[indices,]
versi.y <- d %>% filter(outcome_versi == 1) %>% select(Sepal.Length)
versi.n <- d %>% filter(outcome_versi == 0)%>% select(Sepal.Length)
prroc.sepal.length <-pr.curve(scores.class0 = versi.y$Sepal.Length, scores.class1 = versi.n$Sepal.Length, curve=T)
return(prroc.sepal.length$auc.integral)
}
fc(df)
#AUC = 0.94
#Iris single AUC - functionalized
fcf <- function(score, outcome, data, indices){
d <- data[indices,]
test.pos <- d %>% filter(outcome==1) %>% select(score)
test.neg <- d %>% filter(outcome==0) %>% select(score)
prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
return(prroc.test$auc.integral)
}
fcf(data=df, score=Sepal.Length, outcome = outcome_versi)
#Error: 'outcome' not found```
As I mentioned yesterday, this is a standard NSE problem, which is [almost] always encountered when programming in the tidyverse. The problem is caused by the fact that tidyverse allows you to write, for example,
iris %>% filter(Sepal.Length < 6)
All other things being equal, at the time the function is called, the object Sepal.Length
does not exist, but no error is thrown and the code works "as expected".
Here's how I deal with this in your situation. Note that I have removed the condition
parameter to the function, because I feel this is more naturally handled by a call to filter
earlier in the pipe and I have moved data
/d
to be the first parameter of the function so that it fits more naturally into a pipe.
Also, I don't have the PRROC
package, so have commened out the call to it inside the function, and replaced the original return value accordingly. Simply make the obvious changes to get the functionality you need. The solution to the NSE issue does not depend on access to PRROC
.
library(magrittr)
library(dplyr)
fcf <- function(d, score=Sepal.Length, outcome = outcome_versi){
qScore <- enquo(score)
qOutcome <- enquo(outcome)
test.pos <- d %>% filter(!! qOutcome == 1) %>% select(!! qScore)
test.neg <- d %>% filter(!! qOutcome == 0) %>% select(!! qScore)
# prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
# return(prroc.test$auc.integral)
return(list("pos"=test.pos, "neg"=test.neg))
}
# as_tibble simply to improve formatting
as_tibble(iris) %>%
mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>%
fcf()
$pos
# A tibble: 50 × 1
Sepal.Length
<dbl>
1 7
2 6.4
3 6.9
4 5.5
5 6.5
6 5.7
7 6.3
8 4.9
9 6.6
10 5.2
# … with 40 more rows
$neg
# A tibble: 100 × 1
Sepal.Length
<dbl>
1 5.1
2 4.9
3 4.7
4 4.6
5 5
6 5.4
7 4.6
8 5
9 4.4
10 4.9
# … with 90 more rows
And similarly,
set.seed(123)
as_tibble(iris) %>%
mutate(
outcome_versi = ifelse(Species == "versicolor", 1, 0),
RandomOutcome=runif(nrow(.)) > 0.5
) %>%
filter(Sepal.Length < 6) %>%
fcf(score=Petal.Width, outcome=RandomOutcome)
$pos
# A tibble: 40 × 1
Petal.Width
<dbl>
1 0.2
2 0.2
3 0.2
4 0.3
5 0.2
6 0.2
7 0.2
8 0.1
9 0.1
10 0.4
# … with 30 more rows
$neg
# A tibble: 43 × 1
Petal.Width
<dbl>
1 0.2
2 0.2
3 0.4
4 0.1
5 0.2
6 0.2
7 0.4
8 0.3
9 0.3
10 0.2
# … with 33 more rows
Finally, if you want to use an enquo
ted variable on the left hand side of an assignment, then you need to use :=
rather than =
.