I have the following data frame, with different row lengths:
myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"),
c("Walter","NA","NA","NA","NA"),
c("Walter","Jesse","NA","NA","NA"),
c("Gus","Tuco","Mike","NA","NA"),
c("Gus","Mike","Hank","Saul","Flynn")))
ID <- as.factor(c(1:5))
data.frame(ID,myvar)
ID V1 V2 V3 V4 V5
1 Walter NA NA NA NA
2 Walter NA NA NA NA
3 Walter Jesse NA NA NA
4 Gus Tuco Mike NA NA
5 Gus Mike Hank Saul Flynn
My goal is to switch this data frame into a two column data frame. The first column would be the ID and the other one would be the character name. Note that the ID must be correspondent to the row the character were originally placed. I'm expecting the following result:
ID V
1 Walter
2 Walter
3 Walter
3 Jesse
4 Gus
4 Tuco
4 Mike
5 Gus
5 Mike
5 Hank
5 Saul
5 Flynn
I've tried dcast {reshape2} but it doesn't returned what I need. It is noteworthy that my original data frame is quite big. Any tips? Cheers.
You could use unlist
res <- subset(data.frame(ID,value=unlist(myvar[-1],
use.names=FALSE)), value!='NA')
res
# ID value
#1 1 Walter
#2 2 Walter
#3 3 Walter
#4 4 Gus
#5 5 Gus
#6 3 Jesse
#7 4 Tuco
#8 5 Mike
#9 4 Mike
#10 5 Hank
#11 5 Saul
#12 5 Flynn
NOTE: The NAs
are 'character' elements in the dataset, it is better to create it without quotes so that it will be real NAs and we can remove it by na.omit
, is.na
, complete.cases
etc.
myvar <- data.frame(ID,myvar)