Search code examples
rmatrixrelational

Creating relational matrices with R


My dataframe consists of projects with the different individuals that took part in it, as well as the year in which projects were carried out.

How can I create, for each year, a nxn relational matrix (n being the number of individuals) that counts the number of collaborations between individuals.

Consider the following example that reproduces the desired structure:

# Example dataframe
set.seed(1)
tp=cbind(paste(rep("project",10),1:10,sep=""),sample(2005:2010,10,replace=T))
tp=tp[sample(1:10,50,T),]
id=sample(paste(rep("id",10),1:10,sep=""),50,T)
df=as.data.frame(cbind(tp,id));rm(tp,id)
names(df)=c("project","year","id")
df=df[order(df$project,df$id),]

df[1:10,]
# project  year id
# project1 2006 id1
# project1 2006 id3
# project1 2006 id5
# project1 2006 id5
# project4 2006 id3
# project4 2006 id4
# project5 2006 id3
# project5 2006 id4
# project6 2008 id2
# project6 2008 id3

As an example, a relational matrix for the year 2006 would look like this

    id1 id2 id3 id4 id5
id1  0   0   1   0   1
id2  0   0   0   0   0
id3  1   0   0   2   1
id4  0   0   2   0   0
id5  1   0   1   0   0

# link between 1 and 3, 1 and 5, 3 and 5 on project 1
# links between 3 and 4 on project 4 and project 5
# the matrix is symmetric
# the diagonal is O because an individual cannot collaborate with himself

Solution

  • I altered your sampling code a little bit to make the projects dimension differ from the id dimension as I was playing around with the dimensions of the matrices to ensure I was getting the correct n x n matrices. Here's code that works:

    set.seed(1)
    tp=cbind(paste(rep("project",5),1:5,sep=""),sample(2008:2010,5,replace=T))
    tp=tp[sample(1:5,20,T),]
    id=sample(paste(rep("id",10),1:10,sep=""),20,T)
    df=as.data.frame(cbind(tp,id));rm(tp,id)
    names(df)=c("project","year","id")
    df=df[order(df$project,df$id),]
    
    spl=split(df,df$year)
    net=lapply(spl,function(x){
      m = table(x$id, x$project)
      res = tcrossprod(m)  ## equivalently: res = m %*% t(m)
      diag(res) <- 0
      res <- ifelse(res > 0, 1, 0)
      res
    })
    net
    

    Split Data:

    $`2008`
        project year  id
    5  project1 2008 id4
    7  project1 2008 id6
    19 project1 2008 id6
    2  project5 2008 id1
    13 project5 2008 id2
    1  project5 2008 id4
    16 project5 2008 id9
    
    $`2009`
        project year  id
    9  project2 2009 id2
    6  project2 2009 id5
    20 project2 2009 id6
    17 project2 2009 id7
    14 project2 2009 id8
    11 project3 2009 id7
    
    $`2010`
        project year  id
    3  project4 2010 id4
    8  project4 2010 id5
    15 project4 2010 id5
    12 project4 2010 id8
    18 project4 2010 id8
    4  project4 2010 id9
    10 project4 2010 id9
    

    Adjacency matrices by project for each year:

    $`2008`
    
          id1 id2 id4 id5 id6 id7 id8 id9
      id1   0   1   1   0   0   0   0   1
      id2   1   0   1   0   0   0   0   1
      id4   1   1   0   0   1   0   0   1
      id5   0   0   0   0   0   0   0   0
      id6   0   0   1   0   0   0   0   0
      id7   0   0   0   0   0   0   0   0
      id8   0   0   0   0   0   0   0   0
      id9   1   1   1   0   0   0   0   0
    
    $`2009`
    
          id1 id2 id4 id5 id6 id7 id8 id9
      id1   0   0   0   0   0   0   0   0
      id2   0   0   0   1   1   1   1   0
      id4   0   0   0   0   0   0   0   0
      id5   0   1   0   0   1   1   1   0
      id6   0   1   0   1   0   1   1   0
      id7   0   1   0   1   1   0   1   0
      id8   0   1   0   1   1   1   0   0
      id9   0   0   0   0   0   0   0   0
    
    $`2010`
    
          id1 id2 id4 id5 id6 id7 id8 id9
      id1   0   0   0   0   0   0   0   0
      id2   0   0   0   0   0   0   0   0
      id4   0   0   0   1   0   0   1   1
      id5   0   0   1   0   0   0   1   1
      id6   0   0   0   0   0   0   0   0
      id7   0   0   0   0   0   0   0   0
      id8   0   0   1   1   0   0   0   1
      id9   0   0   1   1   0   0   1   0