Search code examples
rdplyrggnetwork

Unique IDs for pairs based on two columns


Hi I am trying to create a directed network based on this data:

ID Order Name
22 1 AA
22 2 BB
22 3 CC
33 1 AA
33 2 GG
44 1 AA
55 1 AA
55 2 BB

The order is the directional order in which they should be connected. So AA ->BB->CC And the width of the edges should be the count of those connections. So far I was able to make a network but the network is also counting and displaying AA->CC as a connection. My hope is that by making a table with unique ID pairs like the one below I will be able to display and count only the correct connections in the network.

ID Name
1 AA
1 BB
2 BB
2 CC
3 AA
3 GG
4 AA
4 AA
5 AA
5 BB

Any advice on how to achieve this would be great!


Solution

  • I thought, what you want to plot your network looks like this. For each ID, we look if there is a flow from AA --> BB or BB --> CC that is unique inside this ID and count them in weight.

    from to weight
    AA BB 2
    AA GG 1
    BB CC 1

    which can be plotted

    out

    Code

    library(dplyr)
    library(igraph)
    
    data <- data.frame(ID = c(22,22,22,33,33,44,55,55), Order = c(1,2,3,1,2,1,1,1), Name = c("AA", "BB", "CC", "AA", "GG", "AA", "AA", "BB"))
    
    # edge list with consecutive connections only
    edge_weights <- data %>%
      arrange(ID, Order) %>%
      group_by(ID) %>%
      mutate(to = lead(Name)) %>%
      filter(!is.na(to)) %>%
      group_by(from = Name, to) %>%
      summarise(weight = n(), .groups = 'drop')
    
    g <- graph_from_data_frame(edge_weights, directed = TRUE)
    
    plot(g,
         layout = layout_with_fr(g),
         edge.width = E(g)$weight * 2,  # Make edge width proportional to weight
         edge.arrow.size = 0.5,
         vertex.size = 30,
         vertex.color = "lightblue",
         vertex.label.color = "black",
         vertex.label.cex = 1.2,
         main = "Directed Network with Connection Weights")