I have constructed a gene co-expression network from RNA-seq data. The network file is in edge list format of memory around 1gb which was created by calculating Pearson correlation of each gene pairs and the gene pairs which having correlation >95% were selected to create the edge list.
I have clustered this gene network (edge list) using igraph R package "cluster_louvian" community detection algorithm and obtained 534 subclusters. Many of the subclusters have only one vertex in it
How can I score the clusters in order to identify the best clusters which has more vertexes and edges and important for further studies.
You do not provide any data, so I will illustrate with an arbitrary example.
library(igraph)
set.seed(1234)
g = erdos.renyi.game(20,0.1)
plot(g)
CL = cluster_louvain(g)
plot(g, vertex.color=CL$membership)
Now you can get the number of vertices in each cluster and the number of edges that connect them.
## number of vertices per cluster
table(CL$membership)
1 2 3 4 5 6 7
1 1 3 2 3 5 5
## number of edges within each cluster
NumClust = max(CL$membership)
sapply(1:NumClust, function(i)
ecount(induced_subgraph(g, which(CL$membership==i))))
[1] 0 0 2 1 2 4 5