Search code examples
ralgorithmigraphgraph-theorytidygraph

Finding the cumulative sum of the value of nodes in a DAG


Suppose I have the following directed acyclic graph (DAG) with each node having a weight of 1.

simple DAG

I am interested in calculating the accumulated sum of each node based on the value of its ancestor. Assuming as I said earlier that the weight of each node is 1, then this is what I would expect to get

cumulative sum per node

This is what I tried to do:

 library(tidygraph, quietly = TRUE) 
 library(tidyverse)
 library(ggraph)

 # create adjacencies
 grafo_df <- tribble(
  ~from, ~to,
  "C", "A",
  "C", "B",
  "A", "D",
  "B", "D")
 
 # create the graph
 grafo <- as_tbl_graph(grafo_df)
 
 
 # calculate accumulated sum
 grafo %>% 
  arrange(node_topo_order()) %>% 
  mutate(
   
   revenue = 1,
   
   cum_weight = map_dfs(1, .f = function(node, path, ...) {
    
    sum(.N()$revenue[c(node, path$node)])
    
   })) %>% 
  as_tibble() %>% 
  unnest("cum_weight")
 
#> # A tibble: 4 x 3
#>   name  revenue cum_weight
#>   <chr>   <dbl>      <dbl>
#> 1 C           1          1
#> 2 A           1          2
#> 3 B           1          2
#> 4 D           1          3

Created on 2021-05-13 by the reprex package (v2.0.0)

As you can see, the accumulated sum of D results in 3 and not 4, because the value of D should be the sum of the accumulated value of A and B. I do not understand why D does not add 4

I have tried to understand the solution given here, but had a hard time understanding it

How can I get the accumulated sum?

Update # 1

I am not concerned (for the moment) with the complexity of the algorithm, that is, if the algorithm does it in O(V + E) it is not relevant.

Something important that is mentioned in this question is about the problem of counting twice, that is, the partial sum of the value of A is equal to C(1) + A(1) = 2, and the partial sum of the value of B is equal to C(1) + B (1) = 2, so to say that the value of D is not equal to the partial sums of A (2) + B(2) because the value of C would be duplicating I think it does not apply in this situation due to the following:

Let's imagine that each of these 4 nodes (A, B, C and D) are internet nodes that generate revenue of $1 each, so the total accumulated income of the 4 nodes would be $4. If D is the convergence node of the rest of nodes, then in a scenario where D stops working, the income of the remaining nodes and that of D would no longer be possible, therefore, its value is $4.

Update # 2

If I add a new path from C to D then the value of D should always be 4 because the number of dependent nodes is maintained, that is, what should matter is the number of dependent nodes in the accumulated sum. For example, in the solution proposed by @ThomasIsCoding, if I add this new path, the value of D is now 5, I think partly that their algorithm uses the degrees as a parameter to calculate the cumulative sum, however, if I add a additional node then the calculation is correct.

Update # 3

The example that I have placed is simple with the intention that it is easy to understand the objective, however, I did not specify that it should be generalizable for a graph with many nodes with three different topologies. The outermost layers are trees, the middle layers are rings, and the innermost layer is a full mesh.


Solution

  • Here is an igraph option using distance with argument mode = "in"

    • If your nodes are unweighted, i.e., revenue=1 for all nodes
    g <- graph_from_data_frame(grafo_df)
    
    data.frame(name = names(V(g))) %>%
      mutate(revenue = 1) %>%
      mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))
    

    which gives you

      name revenue cum_weight
    1    C       1          1
    2    A       1          3
    3    B       1          2
    4    F       1          1
    5    D       1          5
    
    • If your nodes are weighted, e.g.,
    data.frame(name = names(V(g))) %>%
      mutate(revenue = 1:n()) %>%
      mutate(cum_weight = rowSums((!is.infinite(distances(g, mode = "in"))) %*% diag(revenue)))
    

    which gives you

      name revenue cum_weight
    1    C       1          1
    2    A       2          7
    3    B       3          4
    4    F       4          4
    5    D       5         15
    

    Data

    grafo_df <- tribble(
      ~from, ~to,
      "C", "A",
      "C", "B",
      "A", "D",
      "C", "D",
      "B", "D",
      "F", "A"
    )
    

    and the DAG by plot(g) is given as

    enter image description here