Search code examples
rdataframedplyrtidyracross

Create columns that extract and flag a detected string pattern from multiple selected columns


I have the below dataframe, where each observation has a string of key and values (k:v)s of different objects. Each object is separated by a \n

df = tribble(
  ~col, ~summary,
  "a",   "Oranges: 1\nApples: 1\nPeaches: 6\nBananas: 2",
  "b",   "Apples: 2\nBananas: 1",
  "c",   "Oranges: 4\nPeaches: 2\nBananas: 5",
  "d",   "Peaches: 2"
)

I have separated those values as below:

df_sep = df %>% 
  separate(summary,c("col1","col2","col3","col4"),sep = "\n")

I would like to have the below output using dplyr possibly with across and any if applciable.

# A tibble: 4 x 10
  col   summary                                       Apples_flag Bananas_flag oranges_flag Peaches_flag_value Apples_value Bananas_value oranges_value Peaches_value
  <chr> <chr>                                               <dbl>        <dbl>        <dbl>              <dbl>        <dbl>         <dbl>         <dbl>         <dbl>
1 b     Apples: 2, Bananas: 1                                   1            1            0                  0            2             1             0             0
2 a     Oranges: 1, Apples: 1, Peaches: 6, Bananas: 2           1            1            1                  1            1             2             1             6
3 c     Oranges: 4, Peaches: 2, Bananas: 5                      0            1            1                  1            0             5             4             2
4 d     Peaches: 2                                              0            0            0                  1            0             0             0             2

Solution

  • Another way would be:

    df %>%
      left_join(df%>%
      separate_rows(summary, sep='\n') %>%
      separate(summary, c('name', 'value'), convert = TRUE) %>%
      pivot_wider(col,names_glue = '{name}_value', values_fill = 0) %>%
      mutate(across(-col, sign, .names = '{.col}_flag')))