I am encountering an issue when I use the extraction operator `$() inside of a function. The problem does not exist if I follow the same logic outside of the loop, so I assume there might be a scoping issue that I'm unaware of.
The general setup:
## Make some fake data for your reproducible needs.
set.seed(2345)
my_df <- data.frame(cat_1 = sample(c("a", "b"), 100, replace = TRUE),
cat_2 = sample(c("c", "d"), 100, replace = TRUE),
continuous = rnorm(100),
stringsAsFactors = FALSE)
head(my_df)
This process I am trying to dynamically reproduce:
index <- which(`$`(my_df, "cat_1") == "a")
my_df$continuous[index]
But once I program this logic into a function, it fails:
## Function should take a string for the following:
## cat_var - string with the categorical variable name as it appears in df
## level - a level of cat_var appearing in df
## df - data frame to operate on. Function assumes it has a column
## "continuous".
extract_sample <- function(cat_var, level, df = my_df) {
index <- which(`$`(df, cat_var) == level)
df$continuous[index]
}
## Does not work.
extract_sample(cat_var = "cat_1", level = "a")
This is returning numeric(0)
. Any thoughts on what I'm missing? Alternative approaches are welcome as well.
The problem isn't the function, it's the way $
handles the input.
cat_var = "cat_1"
length(`$`(my_df,"cat_1"))
#> [1] 100
length(`$`(my_df,cat_var))
#> [1] 0
You can instead use [[
to achieve your desired outcome.
cat_var = "cat_1"
length(`[[`(my_df,"cat_1"))
#> [1] 100
length(`[[`(my_df,cat_var))
#> [1] 100
UPDATE
It's been noted that using [[
this way is ugly. And it is. It's useful when you want to write something like lapply(stuff,'[[',1)
Here, you should probably be writing it as my_df[[cat_var]]
.
Also, this question/answer goes into a little more detail about why $
doesn't work the way you want it to.