I have a dataframe which I subset into three. In the origional dataframe I can split the data further on a variable, but once I subset it I can no longer do this, with the error $ operator is invalid for atomic vectors. I am unclear why this is the case, does anyone have any ideas?
I cannot really provide a minimal reproducible example but below is the code used.
#Origional dataset = CT_variable_Biom
##First splitting into three categories
CT_variable_Biom <- CT_variable_Biom %>%
mutate(
level_of_risk = case_when(
high_risk == 1 ~ "high",
medium_risk == 1 ~ "medium",
low_risk == 1 ~ "low",
TRUE ~ NA_character_
)
)
medium_risk <- subset(CT_variable_Biom, CT_variable_Biom$level_of_risk=="medium")
high_risk <- subset(CT_variable_Biom, CT_variable_Biom$level_of_risk=="high")
low_risk <- subset(CT_variable_Biom, CT_variable_Biom$level_of_risk=="low")
#Split based on level
#This one works as normal
False_Negatives_overall <- subset(CT_variable_Biom , CT_variable_Biom$Biomarker<0.25)
#This one returns $ operator is invalid for atomic vectors
False_Negatives <- subset(medium_risk, medium_risk$Biomarker<0.25)
I assume that something in my subseting into the three categories is causing this, but I am not sure what
Many thansk.
The problem is that your dataframe has a column named medium_risk
. When you evaluate
False_Negatives <- subset(medium_risk, medium_risk$Biomarker<0.25)
the subset()
function needs to evaluate medium_risk$Biomarker
. It searches the columns of the dataframe before looking for the global variable medium_risk
, and it finds the column. In this case the simplest fix is the one suggested by @clp, i.e. just use
False_Negatives <- subset(medium_risk, Biomarker<0.25)
In this expression subset()
will be looking for Biomarker
, and it finds that column.
This is an example of why the ?subset
documentation says "This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [
, and in particular the non-standard evaluation of argument subset can have unanticipated consequences."
The standard way to do this would be
False_Negatives <- medium_risk[medium_risk$Biomarker<0.25, ]
and that is unambiguous, because only standard evaluation is used.
Personally I prefer using subset()
, but I try to avoid using any variables that aren't columns of the dataframe. It's not always possible to do that; when things are complicated or when I don't have control of the column names, it's better to follow the advice from the documentation.