Search code examples
rnetwork-programmingfilterggraph

Find connections between values of one column in R and build network grapgh


I want to count how many common values do rows of a column have with each other.

This is what my dataframe looks like:

Location Manager
L1 M45
L2 M45
L34 M12
L5 M45
L23 M12
L4 M3
L11 M45

I want to create a new dataframe with two columns: Location and Links. The new Links column should contain all the locations over the common manager. So, since L1, L2 and L5 have a common manager they should be linked together and so on.

Location Manager
L1 L2,L5
L2 L1,L5
L5 L23
L5 L1,L2
L23 L34
L4
L11

After this, can we create a network graph?

Thanks!


Solution

  • For the first part (getting all locations covered by a manager in a single row) we can do:

    library(dplyr)
    
    df %>%
      group_by(Manager) %>%
      summarize(Location = paste(Location, collapse = ", "))
    #> # A tibble: 3 x 2
    #>   Manager Location       
    #>   <chr>   <chr>          
    #> 1 M12     L34, L23       
    #> 2 M3      L4             
    #> 3 M45     L1, L2, L5, L11
    

    Your original data frame is already in the correct format to make a graph:

    plot(tidygraph::as_tbl_graph(df))
    

    If you want a prettier representation of the graph, you could use ggraph, for example:

    library(ggraph)
    
    df[2:1] %>%
      rbind(data.frame(Manager = "Managers", Location = unique(df$Manager))) %>%
      tidygraph::as_tbl_graph() %>%
      ggraph(circular = TRUE) +
      geom_edge_bend() +
      geom_node_circle(aes(r = ifelse(name == "Managers", 0, 0.1),
                           fill = substr(name, 1, 1))) +
      geom_node_text(aes(label = ifelse(name == "Managers", "", name))) +
      scale_fill_manual(values = c("deepskyblue", "gold"),
                        labels = c("Managers", "Locations"),
                        name = NULL) +
      theme_void(base_size = 16) +
      coord_equal()
    

    enter image description here

    Question data in reproducible format

    df <- data.frame(Location = c("L1", "L2", "L34", "L5", "L23", "L4", "L11"), 
                     Manager = c("M45", "M45", "M12", "M45", "M12", "M3", "M45"))
    

    Created on 2022-08-31 with reprex v2.0.2