Suppose I have the following directed acyclic graph (DAG) with each node having a weight of 1.
I am interested in calculating the accumulated sum of each node based on the value of its ancestor. Assuming as I said earlier that the weight of each node is 1, then this is what I would expect to get
This is what I tried to do:
library(tidygraph, quietly = TRUE)
library(tidyverse)
library(ggraph)
# create adjacencies
grafo_df <- tribble(
~from, ~to,
"C", "A",
"C", "B",
"A", "D",
"B", "D")
# create the graph
grafo <- as_tbl_graph(grafo_df)
# calculate accumulated sum
grafo %>%
arrange(node_topo_order()) %>%
mutate(
revenue = 1,
cum_weight = map_dfs(1, .f = function(node, path, ...) {
sum(.N()$revenue[c(node, path$node)])
})) %>%
as_tibble() %>%
unnest("cum_weight")
#> # A tibble: 4 x 3
#> name revenue cum_weight
#> <chr> <dbl> <dbl>
#> 1 C 1 1
#> 2 A 1 2
#> 3 B 1 2
#> 4 D 1 3
Created on 2021-05-13 by the reprex package (v2.0.0)
As you can see, the accumulated sum of D results in 3 and not 4, because the value of D should be the sum of the accumulated value of A and B. I do not understand why D does not add 4
I have tried to understand the solution given here, but had a hard time understanding it
How can I get the accumulated sum?
Update # 1
I am not concerned (for the moment) with the complexity of the algorithm, that is, if the algorithm does it in O(V + E) it is not relevant.
Something important that is mentioned in this question is about the problem of counting twice, that is, the partial sum of the value of A is equal to C(1) + A(1) = 2, and the partial sum of the value of B is equal to C(1) + B (1) = 2, so to say that the value of D is not equal to the partial sums of A (2) + B(2) because the value of C would be duplicating I think it does not apply in this situation due to the following:
Let's imagine that each of these 4 nodes (A, B, C and D) are internet nodes that generate revenue of $1 each, so the total accumulated income of the 4 nodes would be $4. If D is the convergence node of the rest of nodes, then in a scenario where D stops working, the income of the remaining nodes and that of D would no longer be possible, therefore, its value is $4.
Update # 2
If I add a new path from C to D then the value of D should always be 4 because the number of dependent nodes is maintained, that is, what should matter is the number of dependent nodes in the accumulated sum. For example, in the solution proposed by @ThomasIsCoding, if I add this new path, the value of D is now 5, I think partly that their algorithm uses the degrees as a parameter to calculate the cumulative sum, however, if I add a additional node then the calculation is correct.
Update # 3
The example that I have placed is simple with the intention that it is easy to understand the objective, however, I did not specify that it should be generalizable for a graph with many nodes with three different topologies. The outermost layers are trees, the middle layers are rings, and the innermost layer is a full mesh.
Here is an igraph
option using distance
with argument mode = "in"
revenue=1
for all nodesg <- graph_from_data_frame(grafo_df)
data.frame(name = names(V(g))) %>%
mutate(revenue = 1) %>%
mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))
which gives you
name revenue cum_weight
1 C 1 1
2 A 1 3
3 B 1 2
4 F 1 1
5 D 1 5
data.frame(name = names(V(g))) %>%
mutate(revenue = 1:n()) %>%
mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))
which gives you
name revenue cum_weight
1 C 1 1
2 A 2 7
3 B 3 4
4 F 4 4
5 D 5 15
Data
grafo_df <- tribble(
~from, ~to,
"C", "A",
"C", "B",
"A", "D",
"C", "D",
"B", "D",
"F", "A"
)
and the DAG by plot(g)
is given as