For a dataset like:
21 79
78 245
21 186
65 522
4 21
3 4
4 212
4 881
124 303
28 653
28 1231
7 464
7 52
17 102
16 292
65 837
28 203
28 1689
136 2216
7 1342
56 412
I need to find the number of associated patterns. For example 21-79 and 21-186 have 21 in common. So they form 1 pattern. Also 21 is present in 4-21. This edge also contributes to the same pattern. Now 4-881, 4-212, 3-4 have 4 in their edge. So also contribute to the same pattern. Thus edges 21-79, 21-186, 4-21, 4-881, 4-212, 3-4 form 1 pattern. Similarly there are other patterns. Thus we need to group all edges that have any 1 node common to form a pattern (or subgraph). For the dataset given there are total 4 patterns.
I need to write code (preferably in R) that will find such no. of patterns.
Since you're describing the data as subgraphs, why not use the igraph
package which is very knowledgeable about graphs. So here's your data in data.frame form
dd <- structure(list(V1 = c(21L, 78L, 21L, 65L, 4L, 3L, 4L, 4L, 124L,
28L, 28L, 7L, 7L, 17L, 16L, 65L, 28L, 28L, 136L, 7L, 56L), V2 = c(79L,
245L, 186L, 522L, 21L, 4L, 212L, 881L, 303L, 653L, 1231L, 464L,
52L, 102L, 292L, 837L, 203L, 1689L, 2216L, 1342L, 412L)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -21L))
We can treat each value as a vertex name so the data you provide is really like an edge list. Thus we create our graph with
library(igraph)
gg <- graph.edgelist(cbind(as.character(dd$V1), as.character(dd$V2)),
directed=F)
That defines the nodes and vertex resulting in the following graph (plot(gg)
)
Now you wanted to know the number of "patterns" which are really represented as connected subgraphs in this data. You can extract that information with the clusters()
command. Specifically,
clusters(gg)$no
# [1] 10
Which shows there are 10 clusters in the data you provided. But you only want the ones that have more than two vertices. That we can get with
sum(clusters(gg)$csize>2)
# [1] 4
Which is 4 as you were expecting.