Search code examples
rdataframesubset

Undefined columns selected when subsetting dataframe inside a function


I have a data frame called "outcome" with a column called "pneumonia" and some other column like "State" and "Hospital.Name"

when I run in the command line

outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")
temp <- subset(outcome, State =="NY", select=c(Hospital.Name, Pneumonia)

it works and it creates the temp data frame with 2 columns the Hospital.Name and Pneumonia.

but when I create a function that contains the same instruction

state is a value inside the state column, and outcome1 is just the column name

best <- function(state, outcome1) {
    outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")  
    temp <- subset(outcome, State ==state, select=c(Hospital.Name, outcome1))
}

and I call the function:

best("NY","Pneumonia")

I get the error:

Error in [.data.frame`(x, r, vars, drop = drop) : undefined columns selected

I know the problem is with the outcome1 variable, since when if I hardcode outcome1 in the above function, instead of passing it in as an argument, the function works as expected.


Solution

  • I think you need get around your outcome1 in your function definition, as you are passing a string rather than an object as your argument. With this example data:

    outcome <- data.frame(Pneumonia = sample(0:1, size = 5, replace = TRUE),
                          State = c("NY", "NY", "NY", "CA", "CA"),
                          Hospital.Name = LETTERS[1:5]
                          )
    

    And this modified function:

    best <- function(df_, state_, var_) {
      subset(df_, State == state_, select = c(Hospital.Name, get(var_)))
    }          
    

    Now you can call it more or less as before:

    > best(df_ = outcome, state_ = "NY", var_ = "Pneumonia")
      Hospital.Name Pneumonia
    1             A         0
    2             B         1
    3             C         0