Search code examples
rdataframedata-analysisdata-cleaning

R Language to convert the data frame into specific format


I have the following data frame that I am trying to convert into a specific format. The new items with their quantity are added daily as columns. I want to detect all the items in R language and their quantity automatically and convert into to mentioned expected output.

My dataframe: (new items with their quantity added everyday as new columns)

id    date        a1_item   a1_quantity   a2_item   a2_quantity
z1   2022-02-28   name1       10           name11     20
z1   2021-10-31   name2       20           name21     30
z2   2021-12-31   name3       10           name31     40
r3   2021-10-31   name4       40           name41     10
r4   2021-06-30   name5       30           name51     05
r5   2021-08-31   name6       10           name61     40

My expected output:

id    date         item   quanity
z1   2022-02-28   name1       10 
z1   2022-02-28   name11      20       
z1   2021-10-31   name2       20
z1   2021-10-31   name21      30         
z2   2021-12-31   name3       10
z2   2021-12-31   name31      40            
r3   2021-10-31   name4       40
r3   2021-10-31   name41      10            
r4   2021-06-30   name5       30
r4   2021-06-30   name51      05           
r5   2021-08-31   name6       10 
r5   2021-08-31   name61      40

       

Solution

  • We can use pivot_longer with names_pattern to capture the substring as a group

    library(dplyr)
    library(tidyr)
    df1 %>% 
      pivot_longer(cols = contains("_"), names_to = c(".value"), 
           names_pattern = ".*_(.*)")
    

    -output

    # A tibble: 12 × 4
       id    date       item   quantity
       <chr> <chr>      <chr>     <int>
     1 z1    2022-02-28 name1        10
     2 z1    2022-02-28 name11       20
     3 z1    2021-10-31 name2        20
     4 z1    2021-10-31 name21       30
     5 z2    2021-12-31 name3        10
     6 z2    2021-12-31 name31       40
     7 r3    2021-10-31 name4        40
     8 r3    2021-10-31 name41       10
     9 r4    2021-06-30 name5        30
    10 r4    2021-06-30 name51        5
    11 r5    2021-08-31 name6        10
    12 r5    2021-08-31 name61       40
    

    data

    df1 <- structure(list(id = c("z1", "z1", "z2", "r3", "r4", "r5"), 
    date = c("2022-02-28", 
    "2021-10-31", "2021-12-31", "2021-10-31", "2021-06-30", "2021-08-31"
    ), a1_item = c("name1", "name2", "name3", "name4", "name5", "name6"
    ), a1_quantity = c(10L, 20L, 10L, 40L, 30L, 10L), a2_item = c("name11", 
    "name21", "name31", "name41", "name51", "name61"), a2_quantity = c(20L, 
    30L, 40L, 10L, 5L, 40L)), class = "data.frame", row.names = c(NA, 
    -6L))