I have a data frame that looks like the following:
tibble(term = c(
rep("a:b", 2),
rep("b:a", 2),
rep("c:d", 2),
rep("d:c", 2),
rep("g:h", 2),
rep("h:g", 2)
))
I would like to add an extra column in this data frame that takes on the same value for any pair that have the same characters but reversed and separated by a ":" (i.e. a:b and b:a would be codded the same way; similar for c:d and d:c and all the other pairs).
I thought of something like the following:
%>%
mutate(term_adjusted = case_when(grepl("a:b|b:a", term) ~ "a:b"))
but I have a large number of these pairs in my dataset and would like a way to automate that, hence my question:
How can I do this operation automatically without having to hard code for each pair separately?
Thank you!
How about:
libary(dplyr)
your_data %>%
mutate(term_adjusted = term %>%
strsplit(":") %>%
purrr::map_chr(~ .x %>%
sort() %>%
paste(collapse = ":")))
Base R option
your_data$term_adjusted <- your_data$term |>
strsplit(":") |>
lapply(sort) |>
lapply(paste, collapse = ":") |>
unlist()
Either returns:
# A tibble: 12 x 2
term term_adjusted
<chr> <chr>
1 a:b a:b
2 a:b a:b
3 b:a a:b
4 b:a a:b
5 c:d c:d
6 c:d c:d
7 d:c c:d
8 d:c c:d
9 g:h g:h
10 g:h g:h
11 h:g g:h
12 h:g g:h