Search code examples
rreshape2meltdcast

Recast in R gives different value


I have the following dataframe in R

  DF2<-data.frame("ID"=c("A", "A", "A", "B", "B", "B", "B", 'B'), 
  'Freq'=c(1,2,3,1,2,3,4,5), "Val"=c(1,2,4, 2,3,4,5,8))

The datframe has the following appearance

   ID Freq Val
1  A    1   1
2  A    2   2
3  A    3   4
4  B    1   2
5  B    2   3
6  B    3   4
7  B    4   5
8  B    5   8

I want to melt and recast the dataframe to yield the following dataframe

   A_Freq A_Value B_Freq B_Value
1      1       1      1       2
2      2       2      2       3
3      3       4      3       4
4     NA      NA      4       5
5     NA      NA      5       8

I have tried the following code

 DF3<-melt(DF2, by=ID)
 DF3$ID<-paste0(DF3$ID, DF3$variable)
 DF3$variable<-NULL
 DF4<-dcast(DF3, value~ID)

This yields the following dataframe

     value AFreq AVal BFreq BVal
 1     1     1    1     1   NA
 2     2     2    2     2    2
 3     3     3   NA     3    3
 4     4    NA    4     4    4
 5     5    NA   NA     5    5
 6     8    NA   NA    NA    8

How can I obtain the above result. I have tried other variations of dcast but am unable to obtain the desired result. request someone to help


Solution

  • One option would be

    library(tidyverse)
    DF2 %>% 
        gather(key, val, -ID) %>%
        unite(IDkey, ID, key) %>% 
        group_by(IDkey) %>%
        mutate(rn = row_number()) %>% 
        spread(IDkey, val) %>%
        select(-rn)
    # A tibble: 5 x 4
    #  A_Freq A_Val B_Freq B_Val
    #   <dbl> <dbl>  <dbl> <dbl>
    #1      1     1      1     2
    #2      2     2      2     3
    #3      3     4      3     4
    #4     NA    NA      4     5
    #5     NA    NA      5     8
    

    Or using melt/dcast. We melt, by specifying the id.var as "ID" (as a string) to convert from 'wide' to 'long' format. Then using dcast, reshape from 'long' to 'wide' with the expression rowid(ID, variable) ~ paste(ID, variable, sep="_"). The rhs of ~ paste the column values together, while rowid get the sequence id for the ID, variable columns.

    library(data.table)
    dcast(melt(setDT(DF2), id.var = "ID"), rowid(ID, variable) ~ 
         paste(ID, variable, sep="_"))[, ID := NULL][]
    #   A_Freq A_Val B_Freq B_Val
    #1:      1     1      1     2
    #2:      2     2      2     3
    #3:      3     4      3     4
    #4:     NA    NA      4     5
    #5:     NA    NA      5     8
    

    In the OP's code, the expression is value ~ ID, so it create a column 'value' with each unique element of 'value' and at the same time, automatically picks up the value.var as 'value' resulting in more rows than expected