Search code examples
rtidytextggraph

Adding word count size as a layer to the node size on a cooccurrence network chart using tidytext


I'm interested in using a similar co-occurrence network chart as what is shown on section 8.2.2 David Robinson and Julia Silge's Tidy Text mining book, such as this chart, except that I would like to have the sizes of the nodes change depending on how many times the term shows up in the data: enter image description here

The chart above was established with the following code:

library(tidytext)
library(tidyverse)
library(widyr)
library(igraph)
library(ggraph)
library(jsonlite)

metadata <- fromJSON("https://data.nasa.gov/data.json")
nasa_keyword <- data_frame(id = metadata$dataset$`_id`$`$oid`, 
                           keyword = metadata$dataset$keyword) %>%
  unnest(keyword)

keyword_cors <- nasa_keyword %>% 
  group_by(keyword) %>%
  filter(n() >= 50) %>%
  pairwise_cor(keyword, id, sort = TRUE, upper = FALSE)

set.seed(1234)
keyword_cors %>%
  filter(correlation > .6) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = correlation, edge_width = correlation), edge_colour = "royalblue") +
  geom_node_point(size = 5) +
  geom_node_text(aes(label = name), repel = TRUE,
                 point.padding = unit(0.2, "lines")) +
  theme_void()

I've been playing around with geom_node_point(aes(size = ??)) but I can't figure out how to configure the code to do so. Part of the problem to me is that the function graph_from_data_frame() turns the data frame to a fairly complex looking object.


Solution

  • I would like to have the sizes of the nodes change depending on how many times the term shows up in the data

    You could do

    set.seed(1234)
    keyword_cors %>%
      filter(correlation > .6) %>% 
      graph_from_data_frame(vertices = nasa_keyword %>% count(keyword) %>% filter(n >= 50)) %>% 
      ggraph(layout = "fr") +
      geom_edge_link(aes(edge_alpha = correlation, edge_width = correlation), 
                     edge_colour = "royalblue") +
      geom_node_point(aes(size = n)) + scale_size(range = c(1,10)) + 
      geom_node_text(aes(label = name), repel = TRUE,
                     point.padding = unit(0.2, "lines")) +
      theme_void()
    

    This gives you something like this:

    enter image description here

    • vertices = nasa_keyword %>% count(keyword) %>% filter(n >= 50) adds node information to the graph, more specifically: the node id (first column) and the number of occurrences n (second column).
    • aes(size = n) maps this information to the node size.
    • scale_size(range = c(1,10)) let's you define the minimum and maximum point sizes.