Search code examples
rgraphigraphnetwork-analysis

How to rescale the plot to push the clusters (nodes) a bit further apart and name the clusters in igraph?


I have nodes and edges information, and trying to make a network plot with that. The nodes information has 1552 rows with information:

And the edges information is with four columns with 1203576 entries.

Using the nodes and edges data I used below code to make a network plot.

library(igraph)
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)

plot(net, edge.arrow.size=.4,vertex.label=NA, 
     vertex.color=as.numeric(factor(nodes$type)))

Grouped.net = net
E(Grouped.net)$weight = 1

colnames(nodes)[4] <- "Clusters"

## Add edges with high weight between all nodes in the same group
for(Clus in unique(nodes$Clusters)) {
  GroupV = which(nodes$Clusters == Clus)
  Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
} 


## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(Grouped.net)

# Generate colors based on media type:
colrs <- c("gray50", "yellow", "tomato")
V(net)$color <- colrs[V(net)$type_num]


plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, asp=0, vertex.size=4)
legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
       col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)

The plot I got looks like below:

enter image description here

In the above figure there are 5 clusters.

  1. How do I increase the space between the clusters? How to move them far? And how to adjust the edges? They look weird.

  2. How to name the clusters in the Figure?

  3. How to bring the nodes typeC to the top? They are very few in number. As typeA are huge in number typeC were below.


Solution

  • You have several questions. I will try to answer them all, but in a different order.

    Setup

    library(igraph)
    edges = read.csv("temp/edges_info_5Clusters.csv", stringsAsFactors=T)
    nodes = read.csv("temp/nodes_info_5Clusters.csv", stringsAsFactors=T)
    

    Question 3. How to bring the nodes typeC to the top?
    The nodes are plotted in order of node number. In order to get the infrequent types to be shown, we need those nodes to get the highest node numbers. So just sort on the types to force the nodes to be in the order TypeA, TypeB, TypeC.

    nodes = nodes[order(nodes$type),]
    net <- graph_from_data_frame(d=edges, vertices=nodes, directed=F)
    

    I will just go directly to the grouped plotting that you had in your code to show the result.

    Grouped.net = net
    E(Grouped.net)$weight = 1
    colnames(nodes)[4] <- "Clusters"
    
    ## Add edges with high weight between all nodes in the same group
    for(Clus in unique(nodes$Clusters)) {
      GroupV = which(nodes$Clusters == Clus)
      Grouped.net = add_edges(Grouped.net, combn(GroupV, 2), attr=list(weight=500))
    } 
    
    ## Now create a layout based on G_Grouped
    set.seed(567)
    LO = layout_with_fr(Grouped.net)
    
    colrs <- c("gray50", "yellow", "tomato")
    V(net)$color <- colrs[V(net)$type_num]
    
    plot(net, layout=LO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
        edge.color="lightgray")
    legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
           col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
    

    Network Graph - version 1

    OK, now the TypeC and TypeB are much more visible, but the five clusters are laid out poorly. To get something more like your second (example) graph, we need to construct the layout hierarchically: layout the clusters first and separately lay out the points within the clusters. The layout for the five clusters is simple.

    F5 = make_full_graph(5)
    Stretch = 6
    LO_F5 = Stretch*layout.circle(F5)
    plot(F5, layout=LO_F5)
     
    

    Layout for clusters

    Now we need to layout the points in each cluster, and space them out using the cluster layout just created. But there is a tradeoff here. If you make the clusters far apart, all of the nodes will be small and hard to see. If you want the nodes bigger, you need to make the cluster closer together (so that they all fit on the plot). You have so many links that no matter what you do, the links will all blur together as just a gray background. I picked a middle ground that appealed to me, but I invite you to explore different values of the factor Stretch. Bigger values of Stretch will make the clusters farther apart with smaller nodes. Smaller values will make the clusters closer together with larger nodes. Pick something that works for you.

    set.seed(1234)
    HierLO = matrix(0, ncol=2, nrow=vcount(net))
    for(i in 1:length(levels(nodes$Clusters))) {
        CLUST = which(nodes$Clusters == levels(nodes$Clusters)[i])
        SubNet = induced_subgraph(net, V(net)[CLUST])
        LO_SN = scale(layout_nicely(SubNet))
        HierLO[CLUST, ] = LO_SN + 
            matrix(LO_F5[i,], nrow=vcount(SubNet), ncol=2,byrow=TRUE)
    }
    
    plot(net, layout=HierLO, edge.arrow.size=0,vertex.label=NA, vertex.size=4,
        edge.color="lightgray")
    

    Network Graph - Version 2

    You can now see all of the TypeC nodes and most of the TypeB (except in cluster 1 where there are a lot of TypeB).

    Finally, let's add cluster labels. These just need to be placed relative to the cluster centers. Those centers are sort of given by the layout LO_F5, but igraph plotting rescales the layout so that the plot actually has the range (-1,1). We can rescale LO_F5 ourselves and then stretch the positions a little so that the labels will be just outside the circle.

    LO_Text = LO_F5
    LO_Text[,1] = 2*(LO_F5[,1] - min(LO_F5[,1]))/(max(LO_F5[,1]) - min(LO_F5[,1])) -1
    LO_Text[,2] = 2*(LO_F5[,2] - min(LO_F5[,2]))/(max(LO_F5[,2]) - min(LO_F5[,2])) -1
    text(1.2*LO_Text, labels=levels(nodes$Clusters))
    legend(x=-1.5, y=-1.1, c("typeA","typeB", "typeC"), pch=21,
           col="#777777", pt.bg=colrs, pt.cex=2, cex=.8, bty="n", ncol=1)
    

    Network Graph - Version 3

    The links are still a problem, but I think this addresses your other questions.