Search code examples
rdata.tablereshapetidyrreshape2

R: Reshaping Multiple Columns from Long to Wide


Using following data:

library(tidyverse)

sample_df <- data.frame(Letter = c("a", "a", "a", "b", "b"),
                        Number = c(1,2,1,3,4),
                        Fruit = c("Apple", "Plum", "Peach", "Pear", "Peach"))


Letter Number Fruit
a      1      Apple
a      2      Plum
a      1      Peach
b      3      Pear
b      4      Peach

I want to transform a set of values from a long to a wide format:

Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
a      1        2        Apple   Plum    Peach
b      3        4        Pear    Peach   

To do so, I unsuccessfully tried to create an index of each unique group combinations using c("Letter", "Number") and c("Letter", "Fruit"). Firstly, does this index need to be created, and if so how should it be done?

# Gets Unique Values, but no Index of Unique Combinations
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  mutate(Id1 = n_distinct(Letter, Number),
         Id2 = n_distinct(Letter, Fruit))

# Gets Following Error: Column `Id1` must be length 3 (the group size) or one, not 2
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  mutate(Id1 = 1:n_distinct(Letter, Number),
         Id2 = 1:n_distinct(Letter, Fruit))

# NOTE: Manually Created the Index Columns to show next problem
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  add_column(Id1 = c(1,2,1,1,2),
         Id2 = c(1,2,3,1,2)) 

Assuming it did need to be done, I manually appended the desired values, and partially solved the problem using developmental tidyr.

# Requires Developmental Tidyr
devtools::install_github("tidyverse/tidyr")  

sample_df1 %>%  
  pivot_wider(names_from = c("Id1", "Id2"), values_from = c("Number", "Fruit")) %>%
  set_names(~ str_replace_all(.,"(\\w+.*)(_\\d)(_\\d)", "\\1\\3"))

#  Letter Number_1 Number_2 Number_3 Fruit_1 Fruit_2 Fruit_3
#<fct>     <dbl>    <dbl>    <dbl> <fct>   <fct>   <fct>  
# a         1        2        1     Apple   Plum    Peach  
# b         3        4       NA     Pear    Peach   NA     

However, this approach still created an unwanted Number_3 column. Using any tidyr, data.table or any other package, is there any way of getting the data in the desired format without duplicating columns?


Solution

  • An option would be to replace the duplicated elements by 'Letter' to NA and then in the reshaped data, remove the columns that are all NA

    library(data.table)
    out <- dcast(setDT(sample_df)[, lapply(.SD, function(x) 
         replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter), 
         value.var = c("Number", "Fruit"))
    nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
    out[, (nm1) := NULL][]
    #   Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
    #1:      a        1        2   Apple    Plum   Peach
    #2:      b        3        4    Pear   Peach    <NA>
    

    If we want to use the tidyverse approach, a similar option can be used. Note that pivot_wider is from the dev version of tidyr (tidyr_0.8.3.9000)

    library(tidyverse)
    sample_df %>% 
         group_by(Letter) %>%
         mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
         mutate(rn = row_number()) %>% 
      pivot_wider(
              names_from = rn,
              values_from = c("Number", "Fruit")) %>%
      select_if(~ any(!is.na(.)))
    # A tibble: 2 x 6
    #  Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
    #  <fct>     <dbl>    <dbl> <fct>   <fct>   <fct>  
    #1 a             1        2 Apple   Plum    Peach  
    #2 b             3        4 Pear    Peach   <NA>