I have a data frame called "outcome" with a column called "pneumonia" and some other column like "State" and "Hospital.Name"
when I run in the command line
outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")
temp <- subset(outcome, State =="NY", select=c(Hospital.Name, Pneumonia)
it works and it creates the temp data frame with 2 columns the Hospital.Name and Pneumonia.
but when I create a function that contains the same instruction
state is a value inside the state column, and outcome1 is just the column name
best <- function(state, outcome1) {
outcome <- read.csv("Assigment3/outcome-of-care-measures.csv", colClasses = "character")
temp <- subset(outcome, State ==state, select=c(Hospital.Name, outcome1))
}
and I call the function:
best("NY","Pneumonia")
I get the error:
Error in [.data.frame`(x, r, vars, drop = drop) : undefined columns selected
I know the problem is with the outcome1
variable, since when if I hardcode outcome1
in the above function, instead of passing it in as an argument, the function works as expected.
I think you need get
around your outcome1
in your function definition, as you are passing a string rather than an object as your argument. With this example data:
outcome <- data.frame(Pneumonia = sample(0:1, size = 5, replace = TRUE),
State = c("NY", "NY", "NY", "CA", "CA"),
Hospital.Name = LETTERS[1:5]
)
And this modified function:
best <- function(df_, state_, var_) {
subset(df_, State == state_, select = c(Hospital.Name, get(var_)))
}
Now you can call it more or less as before:
> best(df_ = outcome, state_ = "NY", var_ = "Pneumonia")
Hospital.Name Pneumonia
1 A 0
2 B 1
3 C 0