Search code examples
rdcast

dcast issue on dataframe consisting two columns


I have a dataframe with two columns and trying to convert it two wide format as shown below. However, I am not able to figure out for a data frame consisting only a numeric and a non numeric column.

library(tidyverse)
library(reshape2)

sample_frame<-data.frame(id=c("x","x","y","y","z","z"),value=c(1,2,3,4,5,6))

desired_output<-cbind(x=filter(sample_frame,id=="x")$value,
                       y=filter(sample_frame,id=="y")$value,
                      z=filter(sample_frame,id=="z")$value)
sample_frame
desired_output

desired_output<-dcast(sample_frame,id~.,value.var="value")
desired_output

> sample_frame
  id value
1  x     1
2  x     2
3  y     3
4  y     4
5  z     5
6  z     6
> desired_output
     x y z
[1,] 1 3 5
[2,] 2 4 6
> 
> desired_output<-dcast(sample_frame,id~.,value.var="value")
Aggregation function missing: defaulting to length
> desired_output
  id .
1  x 2
2  y 2
3  z 2

As shown above using dcast is giving me aggregation.


Solution

  • Use sequence numbers within id for the left hand side. We use 1:2 because each id spans two rows.

    dcast(1:2 ~ id, data = sample_frame)[-1]
    ##   x y z
    ## 1 1 3 5
    ## 2 2 4 6
    

    If you don't know how many rows each id spans or if the rows are not adjacent or if not all id's span the same number of rows use this to generate the left hand side.

    s <- with(sample_frame, ave(value, id, FUN = seq_along))
    dcast(s ~ id, data = sample_frame)[-1]
    

    See the comments below for an alternative to s if using data.table is ok.

    Base solution

    A solution using only base R is the following where s is from above.

    xtabs(value ~ s + id, sample_frame)
    

    giving the following xtabs object:

       id
    s   x y z
      1 1 3 5
      2 2 4 6