Search code examples
rdata-cleaning

R language auto Concat the multiple newly added columns


I have the following data frame, I want to concat the columns of item with their quanity. The new items with genertic or brand item name added everyday as a new columns with their quantity. I want to detect all the items in R language and their quantity automatically and want to concat into to mentioned expected output.

id    date        item_>>1_generic  item_>>1_brand    item_>>2_generic  item_>>2_brand  quantity_>>1_daily  quantity_>>2_daily
z1   2022-02-28   NA                name1             name11            NA                10                  20
z1   2021-10-31   name2             NA                name21            NA                10m                 20m
z2   2021-12-31   NA                name3             name31            NA                20mg                20
r3   2021-10-31   name4             NA                name41            NA                40                  50
r4   2021-06-30   NA                name5             NA                name51            50                  60
r5   2021-08-31   name6             NA                NA                name61            10                  30

My expected output is:

id    date         item1_quanity   item2_quanity
z1   2022-02-28   name1_10         name11_20 
z1   2021-10-31   name2_10m        name21_20m       
z2   2021-12-31   name3_20mg       name31_20
r3   2021-10-31   name4_40         name41_50        
r4   2021-06-30   name5_50         name51_60
r5   2021-08-31   name6_10         name61_30            

Solution

  • df %>%
      rename_with(~str_remove(., '>>')) %>%
      pivot_longer(-c(id, date), names_to = c('item','num', '.value'), 
                   names_sep = '_') %>%
      group_by(id, date, num) %>%
      mutate(generic = coalesce(generic, brand), 
             daily = na.omit(daily)) %>%
      filter(!is.na(generic)) %>%
      mutate(generic = str_c(generic, daily, sep = '_')) %>%
      select(-brand, -daily) %>%
      pivot_wider(names_from = c(item, num), values_from = generic,)
    
    # A tibble: 6 x 4
    # Groups:   id, date [6]
      id    date       item_1     item_2    
      <chr> <chr>      <chr>      <chr>     
    1 z1    2022-02-28 name1_10   name11_20 
    2 z1    2021-10-31 name2_10m  name21_20m
    3 z2    2021-12-31 name3_20mg name31_20 
    4 r3    2021-10-31 name4_40   name41_50 
    5 r4    2021-06-30 name5_50   name51_60 
    6 r5    2021-08-31 name6_10   name61_30