Search code examples
rigraphedges

How to generate an igraph-compatible edge set in R from data


I have a data set that currently contains a set of words as well as the paragraph they were originally in like this:

word <- c("wind", "statement", "card", "growth", "egg", "caption", "statement", "robin", "growth")
paragraph <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
data <- data.frame(word, paragraph)

and I'm trying to generate an edge list for an igraph from it that connects each word based on its co-occurrence in a paragraph like this:

node1 <- c("wind", "wind", "statement", "statement", "card", "card", "growth", "growth", "egg", "egg", "caption", "caption", "statement", "statement", "robin", "robin", "growth", "growth")
node2 <- c("statement", "card", "wind", "card", "wind", "statement", "egg", "caption", "growth", "caption", "growth", "egg", "robin", "growth", "statement", "growth", "statement", "robin")
edges <- data.frame(node1, node2)

So far I've only figured out how to calculate the correlations between each word based on paragraph using

data <- data %>% group_by(word) %>% pairwise_cor(word, paragraph, sort = TRUE)

from the widyr package, but for other manipulations I want to run I really need the edges to be the actual number of co-occurrences rather than a correlation coefficient. Does anyone know if there's some code that could fix this for me? Any help would be much much appreciated!!


Solution

  • I am not quite sure what you mean when you say " I really need the edges to be the actual number of co-occurrences rather than a correlation coefficient". However, " I'm trying to generate an edge list for an igraph from it that connects each word based on its co-occurrence in a paragraph" seems pretty clear. I interpret that to mean that if two words are in the same paragraph, they get linked. You can make that kind of edgelist using combn like this:

    Edges = c()
    for(p in unique(data$paragraph)) { 
        Edges = c(Edges, word[combn(which(data$paragraph == p), 2)]) }
    EL = matrix(Edges, ncol=2, byrow=T)
    
    library(igraph)
    
    g = graph_from_edgelist(EL, directed=FALSE)
    plot(g)
    

    Graph from paragraphs