I have a loop going through my data frame to fill a new column based on other column values. I need to identify if there is a 'yes' in more than one column.
Dataset:
df <- data.frame(name=c("AJ", "DJ", "EJ", "MJ", "CJ"),
meds_1=c("yes","yes", "no", "no", "yes"),
meds_2=c("no", "no","no", "yes", "yes"),
meds_3=c("no", "no","no", "no", "no"),
meds_4=c("no", "no","no", "no", "no"),
status=c("Non-stable","Non-stable","stable", "stable", "Non-stable"))
#the new column
df$status_check <- NA
The order should go:
If status == 'stable', then status_check = 'stable', else if someone has 'yes' in more than one column that starts with 'meds_', then status_check = 'combo', else 'other'.
The issue is that I'm not sure how to loop over the columns that start with 'meds_', count the number of 'yes' and if it is over 1, then status_check is 'Combo'
My loop so far:
#I am not sure what function needs to go into the second ifelse statement.
for(i in 1:nrow(df)){
df$status_check<-ifelse(df$status == "stable", "stable",
ifelse([some function],"Combo","Other"))
}
The results should look like:
name meds_1 meds_2 meds_3 meds_4 status status_check
1 AJ yes no no no Non-stable Other
2 DJ yes no no no Non-stable Other
3 EJ no no no no stable stable
4 MJ no yes no no stable stable
5 CJ yes yes no no Non-stable Combo
With dplyr
, we can start with rowSums
to get all rows with >= 2 yes
in meds
. starts_with
is very useful for this use case.
Then a simple case_when
does the trick.
Observation I used the data from the desired output dataframe, as it differs from the original data.
library(dplyr)
df %>%
mutate(status_check = rowSums(across(starts_with('meds_'), ~.x == 'yes')),
status_check = case_when(status == 'stable' ~ 'stable',
status_check >=2 ~ 'combo',
TRUE ~ "other"))
name meds_1 meds_2 meds_3 meds_4 status status_check
1 AJ yes no no no Non-stable other
2 DJ yes no no no Non-stable other
3 EJ no no no no stable stable
4 MJ no yes no no stable stable
5 CJ yes yes no no Non-stable combo