Search code examples
rmatrixoverlap

Combine matrix row / column names in R


I have multiple matrices reflecting bipartite / affiliation networks at different time points. These matrices have a lot of overlap in their incumbents, but also a lot of differences. For further analysis, however, I need them to be the same dimensions and have the same actors per row/column, so I need to combine row and column names somehow.

The final matrices will be around 8000 times 200, but each individual matrix is around 2000 times 150. Here is an example of two matrices and how I want the result to look like:

adj1 <- matrix(0, 3, 5)
colnames(adj1) <- c("g1", "g2", "g3", "g5", "g6")
rownames(adj1) <- c("Tim", "John", "Sarah")

adj2 <- matrix(0, 4, 2)
colnames(adj2) <- c("g1", "g4")
rownames(adj2) <- c("Tim", "Mary", "John", "Paolo")

combined_adj <- matrix(0,5,6)
colnames(combined_adj) <- c("g1","g2","g3","g4","g5","g6")
rownames(combined_adj) <- c("John","Mary","Paolo","Sarah","Tim")

Ideally, the new cells should read "NA" or "10" and rows and columns would be ordered alphabetically. The initial values in each matrix need to be kept. I am at a loss of what to do here and appreciate any help!


Solution

  • You can use merge and specify that you want to use row.names for merging as well.

    combined_adj <- merge(x = adj1,
          y = adj2,
          by = c('row.names', 
                 intersect(colnames(adj1), 
                           colnames(adj2))
                 ), 
          all = TRUE
    )
    combined_adj
      Row.names g1 g2 g3 g5 g6 g4
    1      John  0  0  0  0  0  0
    2      Mary  0 NA NA NA NA  0
    3     Paolo  0 NA NA NA NA  0
    4     Sarah  0  0  0  0  0 NA
    5       Tim  0  0  0  0  0  0
    

    This turns it into a data.frame, so you will need to convert it back to a matrix if required.

    row.names(combined_adj) <- combined_adj[,1]
    combined_adj <- combined_adj[,-1]
    

    Edit: Merge multiple matrices

    We use Reduce to apply it over all matrices. We first convert to data.frame however and create a column with row_names to simplify things.

    # create sample data
    adj1 <- matrix(
      0, 3, 5,
      dimnames = list(c("Tim", "John", "Sarah"), 
                      c("g1", "g2", "g3", "g5", "g6"))
    )
    
    adj2 <- matrix(
      0, 4, 2, 
      dimnames = list(c("Tim", "Mary", "John", "Paolo"),
                      c("g1", "g4"))
    )
    
    adj3 <- matrix(
      0, 3, 3, 
      dimnames = list(c("Tim2", "Mary2", "John"), c("g1", "g4", 'g7'))
    )
    
    # create a list 
    list_matrices <- list(adj1, adj2, adj3)
    
    # convert to dataframes and create a column with row.names
    list_matrices <- lapply(list_matrices, function(mat){
      mat <- as.data.frame(mat)
      mat$row_names <- row.names(mat)
      mat
    })
    
    # successively combine them, merge 1..2 and then merge result with 3 and so on
    res <- Reduce(function(mat1, mat2) merge(mat1, mat2, all = TRUE), x = list_matrices)
    
    res
      g1 row_names g4 g2 g3 g5 g6 g7
    1  0      John  0  0  0  0  0  0
    2  0      Mary  0 NA NA NA NA NA
    3  0     Mary2  0 NA NA NA NA  0
    4  0     Paolo  0 NA NA NA NA NA
    5  0     Sarah NA  0  0  0  0 NA
    6  0       Tim  0  0  0  0  0 NA
    7  0      Tim2  0 NA NA NA NA  0