Consider this example data:
library(tidyverse)
dt <- tibble(Poison = c('Arsenic', 'Arsenic in Wine', 'Cyanide', 'Cyanide and Sugar'),
Result = c('Death', 'Death With Class', 'Death', 'Death'))
I want to create a column that gives each group an identification number. However, I want the poisons to be grouped together by a string detection, i.e., 'Arsenic' and 'Arsenic in Wine' to be one group and 'Cyanide' and 'Cyanide and Sugar' to be another group. Currently, R thinks that each group is it's own, as such:
dt <- dt %>%
group_by(Poison) %>%
mutate(Group = n())
# A tibble: 4 × 3
# Groups: Poison [4]
Poison Result Group
<chr> <chr> <int>
1 Arsenic Death 1
2 Arsenic in Wine Death With Class 1
3 Cyanide Death 1
4 Cyanide and Sugar Death 1
I want it to be so that 'Arsenic' and 'Arsenic in Wine' is Group 1, and 'Cyanide', and 'Cyanide and Sugar' is Group 2. Any ideas?
A combination of case_when
and grepl
could be useful:
dt %>%
mutate(Group = case_when(
grepl("Arsenic", Poison) ~ 1,
grepl("Cyanide", Poison) ~ 2
))
# A tibble: 4 × 3
Poison Result Group
<chr> <chr> <dbl>
1 Arsenic Death 1
2 Arsenic in Wine Death With Class 1
3 Cyanide Death 2
4 Cyanide and Sugar Death 2
If you don't want to write down any poisson, this could be useful:
dt %>%
mutate(Group = sub(" .*", "", Poison) %>%
as.factor %>%
as.integer())