Search code examples
rdataframeigraph

Contract a dataframe of an edge list by summing the contracted edge weights from/to two nodes


I have a dataframe df that contains data on edge weights between two pairs of nodes:

df <- data.frame(c("A","A","B","B","C","C"),
c("B","C","A","C","A","B"),
c(2,3,6,4,9,1))
colnames(df) <- c("node_from", "node_to", "weight")

print(df)
# Output:
  node_from node_to  weight
1     A     B       2
2     A     C       3
3     B     A       6
4     B     C       4
5     C     A       9
6     C     B       1

I would like to contract this dataframe by merging nodes A and B and summing all edge weights to and from these nodes with any other node, in this case C only. The result should be an edge list where the edges between A and B have disappeared and AB is now one node:

# some code to merge nodes A and B

print(df_contracted)
# Output:
  node_from node_to weight
1    AB     C      7
3     C    AB      10

Is there a way to do this efficiently for larger dataframes?

I could convert the dataframe to an actual graph using graph_from_data_frame from the igraph package and then the contract function, but given that I have to do this operation multiple times I'd rather not have to convert it then reconvert it back every time.


Solution

  • Here's a dplyr solution:

    library(dplyr)
    to.merge <- c('A', 'B')
    merged.name <- paste(to.merge, collapse='')
    
    df %>%
      mutate(across(c(node_from, node_to), 
                    ~ if_else(.x %in% to.merge, merged.name, .x))) %>%
      group_by(node_from, node_to) %>%
      summarise(weight = sum(weight), .groups = "drop") %>%
      filter(node_from != node_to)
    # # A tibble: 2 × 3
    #   node_from node_to weight
    #   <chr>     <chr>    <dbl>
    # 1 AB        C            7
    # 2 C         AB          10
    

    It changes all from and to node names that are "A" or "B" to "AB", groups rows with the same combination of from_node and to_node, sums weights within these groups, and finally removes the AB<->AB self-loop.