Search code examples
rd3.jsdata-visualizationhtmlwidgetsnetworkd3

How to visualize simple network in R? With or without D3


I need to visualize the connections between developers in repo1 and repo2.

In particular, I need to show:

  • How some (not all) devs are active in both repos.
  • Some devs are more active than others (the number of 'comm' or commits should be reflected in the size of the nodes, in the two repos.
  • Repos are distinguished by color.

I've used to create a simple network, but I struggle with the next steps.

library(networkD3)

df <- data.frame(devs = c("jeff", "jeff", "james", "james",
"mary", "alfred", "maggie"),repos=c("repo1", "repo2", "repo1", "repo2",
"repo1", "repo2", "repo1"), comm=c("3","3","6","6","3","3","3"))

simpleNetwork(df)

Desired output:

Example

I am new to network graphs, so if this isn't doable with I'm open to suggestions!


Solution

  • You can restructure your data and use networkD3::forceNetwork()...

    library(dplyr)
    library(tidyr)
    
    links <-
      df %>% 
      mutate(node = paste0(devs, "_", repos)) %>% 
      pivot_wider(id_cols = c(devs, comm), names_from = repos, values_from = node) %>% 
      filter(!is.na(repo1) & !is.na(repo2)) %>% 
      mutate(value = 1L) %>% 
      select(source = repo1, target = repo2, value)
    
    nodes <-
      df %>% 
      mutate(id = paste0(devs, "_", repos)) %>% 
      mutate(node_size = as.numeric(comm) * 30) %>% 
      select(id, name = devs, group = repos, node_size)
    
    links$source_id <- match(links$source, nodes$id) - 1L
    links$target_id <- match(links$target, nodes$id) - 1L
    
    forceNetwork(Links = links, Nodes = nodes, Source = "source_id", 
                 Target = "target_id", Value = "value", NodeID = "id", 
                 Nodesize = "node_size", Group = "group", opacity = 1L, 
                 opacityNoHover = 1L, fontSize = 14L)
    

    enter image description here

    links

    #> # A tibble: 2 × 5
    #>   source      target      value source_id target_id
    #>   <chr>       <chr>       <int>     <int>     <int>
    #> 1 jeff_repo1  jeff_repo2      1         0         1
    #> 2 james_repo1 james_repo2     1         2         3
    

    source_id and target_id are the 0-indexed row/index of the node in the nodes data frame

    value should be 1 unless you want to define different values for the weight of the link

    nodes

    #> # A tibble: 7 × 4
    #>   id           name   group node_size
    #>   <chr>        <chr>  <chr>     <dbl>
    #> 1 jeff_repo1   jeff   repo1        90
    #> 2 jeff_repo2   jeff   repo2        90
    #> 3 james_repo1  james  repo1       180
    #> 4 james_repo2  james  repo2       180
    #> 5 mary_repo1   mary   repo1        90
    #> 6 alfred_repo2 alfred repo2        90
    #> 7 maggie_repo1 maggie repo1        90
    

    id or name is the name of the node that will be displayed in the plot (there can be more than one node with the same name if you want)

    group is the group the node is in (these are arbitrary group names and they can all be the same or not)

    node_size gives the size of the node in the plot

    The column names in your links and nodes data frames need to be explicitly specified in the forceNetwork() function call e.g.

    Source = "source_id" Target = "target_id" Value = "value" NodeID = "id" Nodesize = "node_size" Group = "group"