I have network data (cellphone data) from participants in a study, and I'm trying to make a network with this data, but since the edgelist is made up of just participants connecting with other random people, when I use attributes from the participants to add color, etc. to the network visualization, it doesn't apply correctly because it thinks it should apply to all nodes (since the left-hand column is made up of participants, and the right hand is all the random people they called, the random people in the right-hand column are taking on the same attributes as the participant). I was able to find a workaround for coloring the nodes based on being a participant or not using another person's question (code below), but I can't figure out how to use the attributes (e.g. average call length, number of calls) to change parts of the visualization due to the way the edgelist is set up. Is there another way to do this??
code workaround for coloring the nodes based on being a participant:
plot(
BE_1,
vertex.label=NA,
vertex.color=ifelse(degree(BE_1, mode = "out")>0, "red", "black"),
vertex.size=ifelse(degree(BE_1, mode = "out")>0, 15, 4),
edge.arrow.size=.1
)
I've tried vertex.size=
based on different attributes, but that hasn't worked, so here I've just set up the non-participants to be smaller.
From your description your edgelist looks something like this:
library(tidyverse)
el <-
tibble(
from = sample(1:10, 30, replace = TRUE),
to = sample(11:200, size = 30),
duration = runif(30, 3, 200)
) |>
# adding edges between participants
bind_rows(
tibble(
from = c(5, 9),
to = c(7, 3)
)
)
el
#> # A tibble: 32 × 3
#> from to duration
#> <dbl> <dbl> <dbl>
#> 1 3 74 108.
#> 2 7 164 125.
#> 3 3 87 165.
#> 4 3 43 188.
#> 5 8 103 73.5
#> 6 7 144 143.
#> 7 5 198 167.
#> 8 3 143 97.4
#> 9 5 183 122.
#> 10 10 125 44.2
#> # ℹ 22 more rows
Create a data set that describes the participants by aggregating the edgelist.
nodes_participants <-
el |>
group_by(from) |>
summarise(avg_dur = mean(duration),
n_calls = n(), # same as out degree.
# this will allow us to differentiate between participants and non-participants
participant = TRUE) |>
rename(name = from)
nodes_randos contains the names/ids of the non-participants. We need this
because igraph
requires
the nodes dataset to contain all nodes.
nodes_randos <-
el |>
distinct(name = to) |>
mutate(participant = FALSE) |>
# filter out participant edges
filter(!name %in% nodes_participants$name)
Create nodes
dataframe
nodes <-
bind_rows(nodes_participants,
nodes_randos) |>
replace_na(list(avg_dur = 0, n_calls = 0))
Create igraph
object
library(igraph)
g <- graph_from_data_frame(el, directed = TRUE, vertices = nodes)
# This is how we can calculate vertex measures and add them to the graph.
V(g)$out_degree <- degree(g, mode = "out")
plot(g,
vertex.label=NA,
vertex.color=ifelse(V(g)$participant, "red", "black"),
vertex.size=V(g)$out_degree,
edge.arrow.size=.1)
plot(g,
vertex.label=NA,
vertex.color = ifelse(V(g)$participant, "red", "black"),
vertex.size = sqrt(V(g)$avg_dur),
edge.arrow.size=.1)
plot(g,
vertex.label=NA,
vertex.color = ifelse(degree(g, mode = "out")>0, "red", "black"),
vertex.size = V(g)$n_calls,
edge.arrow.size=.1)
ggraph
plotting example.ggraph()
takes care of scaling the node size.
library(ggraph)
ggraph(g, layout = "nicely") +
geom_edge_link() +
geom_node_point(aes(color = participant, size = avg_dur))