data= data.frame(col1=c('m1','m1','m1','m2','m2', 'm2','m3', 'm3'), class=c('a','b','c','a','b','c', 'a', 'b'))I have a data.frame with 2 columns, the 1st column is a list of models, the 2nd column a list of model attributes. I need to show combination of models based on the attributes they share. I got the list of combination using the 'by' function as follows:
data= data.frame(col1=c('m1','m1','m1','m2','m2','m3'), class=c('a','b','c','a','b','c'))
data.ls=by(data$col1, data$class,function(x) t(combn(x, 2)))
The output is exactly what I need, but I need it in a data.frame format instead of a list and the name of the 'class' that appears at the top of each list should be listed in a third column:
# data$class: a
# [,1] [,2]
# [1,] m1 m2
# [2,] m1 m3
# [3,] m2 m3
# Levels: m1 m2 m3
So, I tried this:
as.data.frame(do.call("rbind",data.ls))
But the output shows only the combination of 'col1' (using the id values instead of the name) and not the 'class' attribute, which was at the top of each list in the 'by' output. The output of do.call looks like this:
# V1 V2
# 1 1 2
# 2 1 2
# 3 1 3
Also tried this:
do.call("rbind.data.frame",data.ls)
And got this error: Error in NextMethod() : invalid value
The final table should look like this:
data.final= data.frame(col1=c('m1','m1','m1'), col2=c('m2', 'm2', 'm3'), class=c('a','b','c'))
@Richard Scrivens proposed the following:
newDF <- data.frame(do.call(rbind, lapply(data.ls, as.character)), names(data.ls), row.names = NULL)
The output is:
X1 X2 X3 X4 X5 X6 names.data.ls.
1 m1 m1 m2 m2 m3 m3 a
2 m1 m1 m2 m2 m3 m3 b
3 m1 m2 m1 m2 m1 m2 c
The output in this format is, to me, less clear, in term of the combinations, than the 'by' list.
Any help will be appreciated. Thanks.
You could avoid this by using tapply
, instead of by
. Actually, tapply
is the workhorse function for by
. The following result just needs a cbind
and as.data.frame
, but you can see the point.
do.call(rbind, with(data, {
tapply(as.character(col1), class, function(x) c(combn(x, 2)))
}))
# [,1] [,2]
# a "m1" "m2"
# b "m1" "m2"
# c "m1" "m3"
For the same result, your by
call can be changed a little.
> do.call(rbind, lapply(by(data$col1, data$class, combn, 2), as.character))
# [,1] [,2]
# a "m1" "m2"
# b "m1" "m2"
# c "m1" "m3"