Search code examples
rsubsetpurrr

Correct way to use variable name with subset within purrr::pmap in R?


I have a tibble called description:

description <- structure(list(col1 = "age", col2 = "> 7 months", col3 = "<= 7 months"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

> description
# A tibble: 1 × 3
  col1  col2       col3       
  <chr> <chr>      <chr>      
1 age   > 7 months <= 7 months

And a data frame called my_df:

my_df <- structure(list(ID = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6"
), age = structure(c(1L, 2L, 1L, 2L, 2L, 2L), .Label = c("<= 7 months", 
"> 7 months"), class = "factor")), row.names = c("ID1", "ID2", 
"ID3", "ID4", "ID5", "ID6"), class = "data.frame")

> my_df
     ID         age
ID1 ID1 <= 7 months
ID2 ID2  > 7 months
ID3 ID3 <= 7 months
ID4 ID4  > 7 months
ID5 ID5  > 7 months
ID6 ID6  > 7 months

I currently have the following function:

updated_df <- purrr::pmap(description, function(col1, col2, col3) {
        subset(
            my_df,
            age == col2
        )
})

This produces:

[[1]]
     ID        age
ID2 ID2 > 7 months
ID4 ID4 > 7 months
ID5 ID5 > 7 months
ID6 ID6 > 7 months

But I would like to use the variable col1 instead of age. I have tried the following, but they don't work:

updated_df <- purrr::pmap(description, function(col1, col2, col3) {
        subset(
            my_df,
            col1 == col2
        )
})


updated_df <- purrr::pmap(description, function(col1, col2, col3) {
        subset(
            my_df,
            !!as.name(col1) == col2
        )
})

updated_df <- purrr::pmap(description, function(col1, col2, col3) {
        subset(
            my_df,
            !!col1 == col2
        )
})

Where am I going wrong / what is the correct way to use col1 instead of age?


Solution

  • If you want to stay within subset you'd need to capture and evaluate the full expression. E.g.

    purrr::pmap(description, function(col1, col2, col3) {
      subset(
        my_df,
        eval(rlang::expr(!!rlang::ensym(col1) == col2))
      )
    })
    

    If you're open for using dplyr::filter instead of subset, you can do:

    purrr::pmap(description, function(col1, col2, col3) {
      dplyr::filter(
                    my_df,
                    !!rlang::ensym(col1) == col2
                    )
    })