Search code examples
rdplyrggraphtidygraph

How do I setup data for tidygraph and ggraph?


I'm wanting to run a network analysis but am completely lost at how to get my data structured correctly, since most examples already have data structured at the to and from level.

An example of my data looks like:

df <- data.frame(Name = c("Alice", "Ben", "Tom", "Jane", "Neil", "Alice", "Tom", "Ben", "Jane", "Neil", "Alice", "Tom", "Ben", "Jane", "Bob"),
         Location = c("Ward", "Desk", "Op", "Call", "Off",
                      "Ward", "Desk", "Op", "Call", "Off",
                      "Ward", "Desk", "Op", "Call", "Off"),
         Rating = c(1, 1, 1, 1, 1, 10, 10, 10, 10, 10, 8, 8, 8, 8, 8))

I now wish to get to and from combinations of people, as denoted by Name, for every Rating. You will also note that people can be at a different Location during a different rating, although I'd prefer to for this, in combination with Name to be the nodes and Rating to be the edges.

I have looked at library(iterpc) but am struggling to comprehend the whole combination thing, with five different lineups.

Is there a potential dplyr solution to my problem? Thank you!

EDIT: It looks as though my question is very similar to this yet the answer marked does not work for me, instead I get Error: Column name Name must not be duplicated.


Solution

  • If you want the from column to be Name and the to column to be your Rating column, then tidygraph does this mapping for you.

    library(tidygraph)
    #> Warning: package 'tidygraph' was built under R version 3.6.3
    #> 
    #> Attaching package: 'tidygraph'
    #> The following object is masked from 'package:stats':
    #> 
    #>     filter
    
    df <- data.frame(
      Name = c(
        "Alice", "Ben", "Tom", "Jane", "Neil",
        "Alice", "Tom", "Ben", "Jane", "Neil",
        "Alice", "Tom", "Ben", "Jane", "Bob"
      ),
      Location = c(
        "Ward", "Desk", "Op", "Call", "Off",
        "Ward", "Desk", "Op", "Call", "Off",
        "Ward", "Desk", "Op", "Call", "Off"
      ),
      Rating = c(
        1, 1, 1, 1, 1,
        10, 10, 10, 10, 10,
        8, 8, 8, 8, 8)
    )
    
    tg <- as_tbl_graph(df)
    tg
    #> # A tbl_graph: 11 nodes and 15 edges
    #> #
    #> # A directed acyclic multigraph with 4 components
    #> #
    #> # Node Data: 11 x 1 (active)
    #>   name 
    #>   <chr>
    #> 1 Alice
    #> 2 Ben  
    #> 3 Tom  
    #> 4 Jane 
    #> 5 Neil 
    #> 6 Bob  
    #> # ... with 5 more rows
    #> #
    #> # Edge Data: 15 x 3
    #>    from    to Rating
    #>   <int> <int>  <dbl>
    #> 1     1     7      1
    #> 2     2     8      1
    #> 3     3     9      1
    #> # ... with 12 more rows
    

    You can double-check this mapping is done correctly by looking at the first row of your edge table and see an edge between 1 and 7, which are Alice and Ward, which is the first row in your original data frame.

    data.frame(tg)
    #>     name
    #> 1  Alice
    #> 2    Ben
    #> 3    Tom
    #> 4   Jane
    #> 5   Neil
    #> 6    Bob
    #> 7   Ward
    #> 8   Desk
    #> 9     Op
    #> 10  Call
    #> 11   Off
    

    Created on 2020-09-21 by the reprex package (v0.3.0)