Search code examples
rdataframedplyrpivottidyr

How to convert rows into columns by group?


I'd like to do association analysis using apriori algorithm.

To do that, I have to make a dataset.

What I have data is like this.

data.frame("order_number"=c("100145",  "100155", "100155", "100155", 
"500002", "500002", "500002", "500007"),
"order_item"=c("27684535","15755576", 
"1357954","124776249","12478324","15755576","13577","27684535"))
 order_number order_item

1       100145   27684535

2       100155   15755576

3       100155    1357954

4       100155  124776249

5       500002   12478324

6       500002   15755576

7       500002      13577

8       500007   27684535

and I want to transfer the data like this

data.frame("order_number"=c("100145","100155","500002","500007"), 
"col1"=c("27684535","15755576","12478324","27684535"),
"col2"=c(NA,"1357954","15755576",NA),
"col3"=c(NA,"124776249","13577",NA))
 order_number     col1     col2      col3

1       100145 27684535     <NA>      <NA>

2       100155 15755576  1357954 124776249

3       500002 12478324 15755576     13577

4       500007 27684535     <NA>      <NA>

Thank you for your help.


Solution

  • This would be a case of pivot_wider (or other functions for changing column layout). First step would be creating a row id variable to note whether each is 1, 2 or 3, then shaping this into the dataframe you want:

    df <- data.frame("order_number"=c("100145",  "100155", "100155", "100155", 
                                "500002", "500002", "500002", "500007"),
               "order_item"=c("27684535","15755576", 
                              "1357954","124776249","12478324","15755576","13577","27684535"))
    
    library(tidyr)
    library(dplyr)
    
    
    df |>
      group_by(order_number) |> 
      mutate(rank = row_number()) |> 
      pivot_wider(names_from = rank, values_from = order_item,
                  names_prefix = "col")
    #> # A tibble: 4 × 4
    #> # Groups:   order_number [4]
    #>   order_number col1     col2     col3     
    #>   <chr>        <chr>    <chr>    <chr>    
    #> 1 100145       27684535 <NA>     <NA>     
    #> 2 100155       15755576 1357954  124776249
    #> 3 500002       12478324 15755576 13577    
    #> 4 500007       27684535 <NA>     <NA>