I have a dataset with horses and want to group them based on coat colors. In my dataset more than 140 colors are used, I would like to go back to only a few coat colors and assign the rest to Other. But for some horses the coat color has not been registered, i.e. those are unknown. Below is what the new colors should be. (To illustrate the problem I have an old coat color and a new one. But I want to simply change the coat colors, not create a new column with colors)
Horse ID | Coatcolor(old) | Coatcolor |
---|---|---|
1 | black | Black |
2 | bayspotted | Spotted |
3 | chestnut | Chestnut |
4 | grey | Grey |
5 | cream dun | Other |
6 | Unknown | |
7 | blue roan | Other |
8 | chestnutgrey | Grey |
9 | blackspotted | Spotted |
10 | Unknown |
Instead, I get the data below(second table), where unknown and other are switched.
Horse ID | Coatcolor |
---|---|
1 | Black |
2 | Spotted |
3 | Chestnut |
4 | Grey |
5 | Unknown |
6 | Other |
7 | Unknown |
8 | Grey |
9 | Spotted |
10 | Other |
I used the following code
mydata <- data %>%
mutate(Coatcolor = case_when(
str_detect(Coatcolor, "spotted") ~ "Spotted",
str_detect(Coatcolor, "grey") ~ "Grey",
str_detect(Coatcolor, "chestnut") ~ "Chestnut",
str_detect(Coatcolor, "black") ~ "Black",
str_detect(Coatcolor, "") ~ "Unknown",
TRUE ~ Coatcolor
))
mydata$Coatcolor[!mydata$Coatcolor %in% c("Spotted", "Grey", "Chestnut", "Black", "Unknown")] <- "Other"
So what am I doing wrong/missing? Thanks in advance.
You can use the recode
function of thedplyr
package. Assuming the missing spots are NA
' s, you can then subsequently set all NA
's to "Other" with replace_na
of the tidyr
package. It depends on the format of your missing data spots.
mydata <- tibble(
id = 1:10,
coatcol = letters[1:10]
)
mydata$coatcol[5] <- NA
mydata$coatcol[4] <- ""
mydata <- mydata %>%
mutate_all(list(~na_if(.,""))) %>% # convert empty string to NA
mutate(Coatcolor_old = replace_na(coatcol, "Unknown")) %>% #set all NA to Unknown
mutate(Coatcolor_new = recode(
Coatcolor_old,
'spotted'= 'Spotted',
'bayspotted' = 'Spotted',
'old_name' = 'new_name',
'a' = 'A', #etc.
))
mydata