Search code examples
rsparse-matrix

r - Binding sparse matrices of different sizes on rows


I am attempting to use the Matrix package to bind two sparse matrices of different size together. The binding is on rows, using the column names for matching.

Table A:

ID     | AAAA   | BBBB   |
------ | ------ | ------ |
XXXX   | 1      | 2      |

Table B:

ID     | BBBB   | CCCC   |
------ | ------ | ------ |
YYYY   | 3      | 4      |

Binding table A and B:

ID     | AAAA   | BBBB   | CCCC   |
------ | ------ | ------ | ------ |
XXXX   | 1      | 2      |        |
YYYY   |        | 3      | 4      |

The intention is to insert a large number of small matrices into a single large matrix, to enable continuous querying and update/inserts.

I find that neither the Matrix or slam packages have functionality to handle this.

Similar questions have been asked in the past, but it seems no solution has been found:

Post 1: in-r-when-using-named-rows-can-a-sparse-matrix-column-be-added-concatenated

Post 2: bind-together-sparse-model-matrices-by-row-names

Ideas on how to solve it will be highly appreciated.

Best regards,

Frederik


Solution

  • It looks it's necessary to have empty columns (columns with 0s) added to the matrices so to make them compatible for a rbind (matrices with the same column names, and on the same order). The following code does it:

    # dummy data
    set.seed(3344)
    A = Matrix(matrix(rbinom(16, 2, 0.2), 4))
    colnames(A)=letters[1:4]
    B = Matrix(matrix(rbinom(9, 2, 0.2), 3))
    colnames(B) = letters[3:5]
    
    # finding what's missing
    misA = colnames(B)[!colnames(B) %in% colnames(A)]
    misB = colnames(A)[!colnames(A) %in% colnames(B)]
    
    misAl = as.vector(numeric(length(misA)), "list")
    names(misAl) = misA
    misBl = as.vector(numeric(length(misB)), "list")
    names(misBl) = misB
    
    ## adding missing columns to initial matrices
    An = do.call(cbind, c(A, misAl))
    Bn = do.call(cbind, c(B, misBl))[,colnames(An)]
    
    # final bind
    rbind(An, Bn)