Firstly, I wish I could think of a better way to describe my problem. I'm trying to take the following example list:
lst<- list(c(2), c(1,3), c(2), c(4), c(6), c(5, 7), c(6,8), c(7))
And turn it into this:
res<- list(c(1,2,3), c(1,2,3), c(1,2,3), c(4), c(5,6,7,8), c(5,6,7,8),c(5,6,7,8), c(5,6,7,8))
OR even this:
also_good_answer<- list(c(1, 2, 3), logical(0), logical(0), 4, c(5, 6, 7, 8), logical(0),
logical(0), logical(0))
I've been using a couple while
statements in combination with setdiff
to do this, but I'm wondering if there is an eloquent and faster way to do this over a large list?
As always, thank you in advance. -nate
My method:
lst<- list(c(2), c(1,3), c(2), c(4), c(6), c(5, 7), c(6,8), c(7)) # Original List
to_sequence<- 1:length(lst)
res<- lapply(1:length(lst), function(x) {return(NA)}) # Building Result Object
while(length(to_sequence) > 0){
tres<- c()
idx<- to_sequence[1]
next_idxs<- setdiff(lst[[idx]], NA)
tres<- c(tres, next_idxs)
while(length(next_idxs) >=1){
next_idxs<- lapply(seq_along(next_idxs), function(x){
lst[[next_idxs[x]]]
} ) %>% unlist() %>% setdiff(., unlist(tres)) # uses slow setdiff
tres<-c(tres, next_idxs)
}
res[[idx]]<- tres
to_sequence<- setdiff(to_sequence, tres) # Another slow setdiff
cat("Length to_sequence:", base::prettyNum(length(to_sequence),big.mark = ","), "\n")
}
res<- lapply(res, sort)
EDITS
Updated lst
:
lst<- list(c(2), c(1,3), c(2,8), c(4), c(6), c(5, 7), c(6,9), c(3),c(7))
Giving Desired Output:
list(c(1, 2, 3, 8), logical(0), logical(0), 4, c(5, 6, 7, 9), logical(0), logical(0), logical(0), logical(0))
According to the updated example and expected output, we use replace
and duplicated
to nullify the cluster duplicates (see the last line out <- ...
)
library(igraph)
g <- graph_from_data_frame(stack(setNames(lst, seq_along(lst))))
clst <- membership(components(g))
nms <- as.integer(names(clst))
res <- unname(by(nms, clst, sort))[clst[order(nms)]]
out <- replace(res, duplicated(res), list(logical(0)))
and we can obtain
> out
[[1]]
[1] 1 2 3 8
[[2]]
logical(0)
[[3]]
logical(0)
[[4]]
[1] 4
[[5]]
[1] 5 6 7 9
[[6]]
logical(0)
[[7]]
logical(0)
[[8]]
logical(0)
[[9]]
logical(0)
You can use membership
from igraph
to detect the clusters
library(igraph)
g <- graph_from_data_frame(stack(setNames(lst, seq_along(lst))))
clst <- membership(components(g))
nms <- as.integer(names(clst))
res <- unname(by(nms, clst, sort))[clst[order(nms)]]
and you will obtain
> res
[[1]]
[1] 1 2 3
[[2]]
[1] 1 2 3
[[3]]
[1] 1 2 3
[[4]]
[1] 4
[[5]]
[1] 5 6 7 8
[[6]]
[1] 5 6 7 8
[[7]]
[1] 5 6 7 8
[[8]]
[1] 5 6 7 8
If you are interested in how the problem is interpreted into a graph, you can use plot(as.undirected(g))
, which shows