Search code examples
rdplyrreshape2dcast

I have a long data set. How do I convert to wide format while using a rank function to limit the number of columns created?


I have a data set in a long format. Some records only have one manufacturer, some records are duplicated because there are multiple manufacturers associated with each record.

For example, I have the following data:

id      manufacturer
111     AAA
222     AAA
222     BBB
222     CCC
222     DDD
333     CCC
333     DDD
444     EEE

Using dplyr, I added a rank so I am able to count the number of manufacturers.

df %>% 
  select(id, manufacturer) %>% 
  group_by(id) %>% 
  mutate(rank = rank(manufacturer, ties.method = "first"))

To get:

id      manufacturer    rank
111     AAA             1
222     AAA             1
222     BBB             2
222     CCC             3
222     DDD             4
333     CCC             1
333     DDD             2
444     EEE             1

I've tried dcast which gets me close, but I'm not quite there.

dcast(df, id ~ rank)

When I use dcast I get:

id     1    2    3    4
111    1    NA   NA   NA
222    1    2    3    4
333    1    2    NA   NA
444    1    NA   NA   NA

What I am trying to get is:

id     manuf_1    manuf_2    manuf_3    manuf_4
111    AAA                                
222    AAA        BBB        CCC        DDD
333    CCC        DDD        
444    EEE                          

Solution

  • You just need to set the value.var, and mutate rank to your desired format before dcast

    df %>% 
      dplyr::mutate(rank = paste0('manuf_', rank)) %>% 
      reshape2::dcast(id ~ rank, value.var = 'manufacturer')
    #    id manuf_1 manuf_2 manuf_3 manuf_4
    # 1 111     AAA    <NA>    <NA>    <NA>
    # 2 222     AAA     BBB     CCC     DDD
    # 3 333     CCC     DDD    <NA>    <NA>
    # 4 444     EEE    <NA>    <NA>    <NA>
    

    data used

    df <- data.table::fread('
    id      manufacturer    rank
    111     AAA             1
    222     AAA             1
    222     BBB             2
    222     CCC             3
    222     DDD             4
    333     CCC             1
    333     DDD             2
    444     EEE             1
    ')