Search code examples
rconditional-statementstidyversedplyrgrepl

How can I automate this simple conditional column operation in R?


I have a data frame that looks like the following:

tibble(term = c(
  rep("a:b", 2),
  rep("b:a", 2),
  rep("c:d", 2),
  rep("d:c", 2),
  rep("g:h", 2),
  rep("h:g", 2)
)) 

I would like to add an extra column in this data frame that takes on the same value for any pair that have the same characters but reversed and separated by a ":" (i.e. a:b and b:a would be codded the same way; similar for c:d and d:c and all the other pairs).

I thought of something like the following:

%>%
  mutate(term_adjusted = case_when(grepl("a:b|b:a", term) ~ "a:b"))

but I have a large number of these pairs in my dataset and would like a way to automate that, hence my question:

How can I do this operation automatically without having to hard code for each pair separately?

Thank you!


Solution

  • How about:

    libary(dplyr)
    
    your_data %>%
      mutate(term_adjusted = term %>%
                               strsplit(":") %>%
                               purrr::map_chr(~ .x %>%
                                               sort() %>%
                                               paste(collapse = ":")))
    

    Base R option

    your_data$term_adjusted <- your_data$term |>
                                 strsplit(":") |>
                                 lapply(sort) |>
                                 lapply(paste, collapse = ":") |>
                                 unlist()
    

    Either returns:

    # A tibble: 12 x 2
       term  term_adjusted
       <chr> <chr>
     1 a:b   a:b
     2 a:b   a:b
     3 b:a   a:b
     4 b:a   a:b
     5 c:d   c:d
     6 c:d   c:d
     7 d:c   c:d
     8 d:c   c:d
     9 g:h   g:h
    10 g:h   g:h
    11 h:g   g:h
    12 h:g   g:h