Search code examples
rggplot2ggraph

ggraph - increase node size based on frequency


Been reading Tidytext Mining with R by Julia Silge and David Robinson - https://www.tidytextmining.com/nasa.html - and stumped on how to have the node size adjust in relation to frequency (n). Tried the following code...

 library(widyr)
 set.seed(1234)
 title_word_pairs %>%
 filter(n >= 250) %>%
 graph_from_data_frame() %>%
 ggraph(layout = "fr") +
 geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = 
 "royalblue") +
 geom_node_point(aes(size = n)) + scale_size(range = c(2,10)) +
 geom_node_text(aes(label = name), repel = TRUE,
            point.padding = unit(0.2, "lines")) +
 theme_void()

...and receive this error...

 Error: Column `size` must be a 1d atomic vector or a list
 Call `rlang::last_error()` to see a backtrace

Any thoughts or ideas would be appreciated.


Solution

  • The issue is that this frequency n is for edges, not vertices. So geom_edge_link finds n because n is an edge attribute, while geom_node_point doesn't find n because it's not among vertex attributes.

    So then we wish to construct another variable that would actually be the vertex frequency.

    subt <- title_word_pairs %>%
      filter(n >= 250)
    vert <- subt %>% gather(item, word, item1, item2) %>%
      group_by(word) %>% summarise(n = sum(n))
    
    subt %>%
      graph_from_data_frame(vertices = vert) %>%
      ggraph(layout = "fr") +
      geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = "royalblue") +
      geom_node_point(aes(size = n)) + scale_size(range = c(2,10)) +
      geom_node_text(aes(label = name), repel = TRUE, point.padding = unit(0.2, "lines")) +
      theme_void()
    

    Here subt is the same as before, then vert contains two columns: vertices (words) and their frequency in subt as a sum or relevant edge frequencies. Lastly, I added vertices = vert as to pass this vertex attribute.