I am working with a large network with thousands of nodes and edges to consider. A reprex of the network can be found in a previous question here Number of Connected Nodes in a dendrogram
However, when calculating the number of nodes within the network, I ran into a problem when trying to calculate the number of nodes that add together to lead to the next level up. For example,
library(tidygraph)
library(ggraph)
library(tidyverse)
parent_child <- tribble(
~parent, ~child,
"a", "b",
"b", "c",
"b", "d",
"d", "e",
"d", "f",
"d", "g",
"g", "z"
)
# converted to a dendrogram ------------
parent_child %>%
as_tbl_graph() %>%
ggraph(layout = "dendrogram") +
geom_node_point() +
geom_node_text(aes(label = name),
vjust = -1,
hjust = -1) +
geom_edge_elbow()
# Table of calculations ----------------------
parent_child %>%
as_tbl_graph() %>%
activate(nodes) %>%
mutate(n_community_out = local_size(order = graph_size(),
mode = "out",
mindist = 0)) %>%
as_tibble()
# Final Output Table -----------------------
# A tibble: 8 x 2
name n_community_out
<chr> <dbl>
1 a 8
2 b 7
3 d 5
4 g 2
5 c 1
6 e 1
7 f 1
8 z 1
The table above shows the number of connected nodes out from a starting node. However, why do certain levels not add up to the next level? (node d + c != node b) I've been trying to explain this to colleagues, but cannot adequately explain what the network is counting and why adding up the node connections from on position to the next does not lead to the next higher level.
This problem is exacerbated within a network with thousands of nodes, and is difficult to display. Anyway, does anyone know how to explain why nodes connections do not add up to the next level? Any help is greatly appreciated.
You're having a one-off-by error. When comparing the number of nodes it's connected to, you need to subtract one because of how you're counting by including nodes themselves in the connected nodes count.
For your example of
Node D + Node E ?= Node B
your table gives the values
...
2 b 7
3 d 5
...
5 c 1
...
You've intentionally set mindist = 0
so that when counting nodes from a parent, you include that node itself.
Here's a quick visual to see the directionality.
library(tidygraph)
library(ggraph)
library(tidyverse)
parent_child <- tribble(
~parent, ~child,
"a", "b",
"b", "c",
"b", "d",
"d", "e",
"d", "f",
"d", "g",
"g", "z"
)
plot(as_tbl_graph(parent_child))
Created on 2020-11-25 by the reprex package (v0.3.0)
Node C can't point to anything else, but because of mindist = 0
, it will count itself and have its community equal 1
like it is in your table.
Node D can visit 4 nodes (e
, f
, g
, z
) and when we count itself, its local neighborhood is a total of 5 nodes.
Similarly, Node B will count all the nodes it's connected to, but also count itself.
So to get the actual counts to compare, you'll need to subtract one.
Node D + Node E
=> 5 + 1
=> 6
Node B = 7
=> 7 - 1
=> 6