Search code examples
rcsvigraphverticesedges

Building network in R


I have a csv file which looks like this:

"","people_id","commit_id"
 "1",1,0
 "2",1,117
 "3",1,144
 "4",1,278
 …

Here's the csv file if you wanna look at it. It contains 11735 lines but 5923 unique people ids.

Does anyone know how to connect the people ids with the common "commit_id" and ignore commit_id 0 as id 0 does not exist.

For now I have done this:

# read the csv file 
commitsNetwork <- read.csv("commits.csv", header=TRUE)

# use a subset for demo purpose

commitsNetwork <- commitsNetwork[c("people_id", "commit_id")]

#build edgelist(for commits)
C <- spMatrix(nrow = length(unique(commitsNetwork$people_id)),
              ncol = length(unique(commitsNetwork$commit_id)),
              i = as.numeric(factor(commitsNetwork$people_id)),
              j = as.numeric(factor(commitsNetwork$commit_id)),
              x = rep(1, length(as.numeric(commitsNetwork$people_id))) )
row.names(C) <- levels(factor(commitsNetwork$people_id))
colnames(C) <- levels(factor(commitsNetwork$commit_id))
adjC <- tcrossprod(C) 
comG <- graph.adjacency(adjC, mode = "undirected", weighted = TRUE, diag = FALSE)

#write to pajek file
write.graph(comG, "comNetwork.net", format = "pajek")

Also, the edges are from the 2nd column "commit_id". If both vertices(people) are connected by the common commit_id from the 6th column.

Therefore I'm not sure how to generate the network with this csv file in R.

The ideal output is should turn out like:

*Vertices 5923 1

2

3

4

...

*Edges

1 4 1

1 25 1

1 39 1

1 41 1

1 48 1

until 5923...


Solution

  • Maybe you want something like this:

    library(igraph)
    library(Matrix)
    
    download.file("https://www.dropbox.com/s/q7sxfwjec97qzcy/people.csv?dl=1", 
                  tf <- tempfile(fileext = ".csv"), mode = "wb")
    people <- read.csv(tf)
    
    A <- spMatrix(nrow = length(unique(people$people)),
                  ncol = length(unique(people$repository_id)),
                  i = as.numeric(factor(people$people)),
                  j = as.numeric(factor(people$repository_id)),
                  x = rep(1, length(as.numeric(people$people))) )
    row.names(A) <- levels(factor(people$people))
    colnames(A) <- levels(factor(people$repository_id))
    adj <- tcrossprod(A) 
    g <- graph.adjacency(adj, mode = "undirected", weighted = TRUE, diag = FALSE)
    

    See also here.