Search code examples
rfunctionscoping

Extraction operator `$`() returns zero-length vectors within function


I am encountering an issue when I use the extraction operator `$() inside of a function. The problem does not exist if I follow the same logic outside of the loop, so I assume there might be a scoping issue that I'm unaware of.

The general setup:

## Make some fake data for your reproducible needs.
set.seed(2345)

my_df <- data.frame(cat_1 = sample(c("a", "b"), 100, replace = TRUE),
                    cat_2 = sample(c("c", "d"), 100, replace = TRUE),
                    continuous  = rnorm(100),
                    stringsAsFactors = FALSE)
head(my_df)

This process I am trying to dynamically reproduce:

index <- which(`$`(my_df, "cat_1") == "a")

my_df$continuous[index]

But once I program this logic into a function, it fails:

## Function should take a string for the following:
##  cat_var - string with the categorical variable name as it appears in df
##  level - a level of cat_var appearing in df
##  df - data frame to operate on.  Function assumes it has a column 
##    "continuous".
extract_sample <- function(cat_var, level, df = my_df) {

  index <- which(`$`(df, cat_var) == level)

  df$continuous[index]

}

## Does not work.
extract_sample(cat_var = "cat_1", level = "a")

This is returning numeric(0). Any thoughts on what I'm missing? Alternative approaches are welcome as well.


Solution

  • The problem isn't the function, it's the way $ handles the input.

    cat_var = "cat_1"
    length(`$`(my_df,"cat_1"))
    #> [1] 100
    length(`$`(my_df,cat_var))
    #> [1] 0 
    

    You can instead use [[ to achieve your desired outcome.

    cat_var = "cat_1"
    length(`[[`(my_df,"cat_1"))
    #> [1] 100
    length(`[[`(my_df,cat_var))
    #> [1] 100
    

    UPDATE

    It's been noted that using [[ this way is ugly. And it is. It's useful when you want to write something like lapply(stuff,'[[',1)

    Here, you should probably be writing it as my_df[[cat_var]].

    Also, this question/answer goes into a little more detail about why $ doesn't work the way you want it to.