Search code examples
rmultiple-columnsreshape

Reshape data frame with different column lengths into two columns replicating column ID


I have the following data frame, with different row lengths:

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
                             c("Walter","NA","NA","NA","NA"),
                             c("Walter","Jesse","NA","NA","NA"),
                             c("Gus","Tuco","Mike","NA","NA"), 
                             c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))   
data.frame(ID,myvar)

ID     V1    V2   V3   V4    V5
 1 Walter    NA   NA   NA    NA
 2 Walter    NA   NA   NA    NA
 3 Walter Jesse   NA   NA    NA
 4    Gus  Tuco Mike   NA    NA
 5    Gus  Mike Hank Saul Flynn

My goal is to switch this data frame into a two column data frame. The first column would be the ID and the other one would be the character name. Note that the ID must be correspondent to the row the character were originally placed. I'm expecting the following result:

ID      V
1  Walter    
2  Walter
3  Walter
3  Jesse
4  Gus
4  Tuco
4  Mike
5  Gus
5  Mike
5  Hank
5  Saul
5  Flynn

I've tried dcast {reshape2} but it doesn't returned what I need. It is noteworthy that my original data frame is quite big. Any tips? Cheers.


Solution

  • You could use unlist

     res <- subset(data.frame(ID,value=unlist(myvar[-1], 
                                  use.names=FALSE)), value!='NA')
     res
     #   ID  value
     #1   1 Walter
     #2   2 Walter
     #3   3 Walter
     #4   4    Gus
     #5   5    Gus
     #6   3  Jesse
     #7   4   Tuco
     #8   5   Mike
     #9   4   Mike
     #10  5   Hank
     #11  5   Saul
     #12  5  Flynn
    

    NOTE: The NAs are 'character' elements in the dataset, it is better to create it without quotes so that it will be real NAs and we can remove it by na.omit, is.na, complete.cases etc.

    data

    myvar <- data.frame(ID,myvar)