Search code examples
rnetworkingnodesnetwork-analysisedge-list

Network edgelist issue- only data on participants, can't add in attributes


I have network data (cellphone data) from participants in a study, and I'm trying to make a network with this data, but since the edgelist is made up of just participants connecting with other random people, when I use attributes from the participants to add color, etc. to the network visualization, it doesn't apply correctly because it thinks it should apply to all nodes (since the left-hand column is made up of participants, and the right hand is all the random people they called, the random people in the right-hand column are taking on the same attributes as the participant). I was able to find a workaround for coloring the nodes based on being a participant or not using another person's question (code below), but I can't figure out how to use the attributes (e.g. average call length, number of calls) to change parts of the visualization due to the way the edgelist is set up. Is there another way to do this??

code workaround for coloring the nodes based on being a participant:

plot(
  BE_1, 
  vertex.label=NA, 
  vertex.color=ifelse(degree(BE_1, mode = "out")>0, "red", "black"),
  vertex.size=ifelse(degree(BE_1, mode = "out")>0, 15, 4),
  edge.arrow.size=.1
)

I've tried vertex.size= based on different attributes, but that hasn't worked, so here I've just set up the non-participants to be smaller.


Solution

  • From your description your edgelist looks something like this:

    library(tidyverse)
      
    el <- 
      tibble(
        from = sample(1:10, 30, replace = TRUE),
        to = sample(11:200, size = 30),
        duration = runif(30, 3, 200)
      ) |> 
      # adding edges between participants
      bind_rows(
        tibble(
          from = c(5, 9),
          to = c(7, 3)
        )
      )
    
    el
    #> # A tibble: 32 × 3
    #>     from    to duration
    #>    <dbl> <dbl>    <dbl>
    #>  1     3    74    108. 
    #>  2     7   164    125. 
    #>  3     3    87    165. 
    #>  4     3    43    188. 
    #>  5     8   103     73.5
    #>  6     7   144    143. 
    #>  7     5   198    167. 
    #>  8     3   143     97.4
    #>  9     5   183    122. 
    #> 10    10   125     44.2
    #> # ℹ 22 more rows
    

    Create a data set that describes the participants by aggregating the edgelist.

    nodes_participants <- 
      el |> 
      group_by(from) |> 
      summarise(avg_dur = mean(duration),
                n_calls = n(), # same as out degree.
                # this will allow us to differentiate between participants and non-participants
                participant = TRUE) |> 
      rename(name = from)
    

    nodes_randos contains the names/ids of the non-participants. We need this because igraph requires the nodes dataset to contain all nodes.

    nodes_randos <- 
      el |> 
      distinct(name = to) |> 
      mutate(participant = FALSE) |> 
      # filter out participant edges
      filter(!name %in% nodes_participants$name)
    

    Create nodes dataframe

    nodes <- 
      bind_rows(nodes_participants,
                nodes_randos) |> 
      replace_na(list(avg_dur = 0, n_calls = 0))
    

    Create igraph object

    library(igraph)
    
    g <- graph_from_data_frame(el, directed = TRUE, vertices = nodes)
    
    # This is how we can calculate vertex measures and add them to the graph.
    V(g)$out_degree <- degree(g, mode = "out")
    

    Some igraph plotting examples.

    plot(g,
         vertex.label=NA, 
         vertex.color=ifelse(V(g)$participant, "red", "black"),
         vertex.size=V(g)$out_degree,
         edge.arrow.size=.1)
    

    plot(g,
         vertex.label=NA, 
         vertex.color = ifelse(V(g)$participant, "red", "black"),
         vertex.size = sqrt(V(g)$avg_dur),
         edge.arrow.size=.1)
    

    plot(g,
         vertex.label=NA, 
         vertex.color = ifelse(degree(g, mode = "out")>0, "red", "black"),
         vertex.size = V(g)$n_calls,
         edge.arrow.size=.1)
    

    ggraph plotting example.

    ggraph() takes care of scaling the node size.

    library(ggraph)
    
    ggraph(g, layout = "nicely") +
      geom_edge_link() +
      geom_node_point(aes(color = participant, size = avg_dur))