Search code examples
rsparse-matrix

how to equalise the columns of two sparse matrices


I've got two sparse matrices, for a training and test set, and I need to remove columns in each that are not in the other - making the columns the same in both. At the moment I'm doing so with a loop, but I'm sure there is a more efficient way to do it:

# take out features in training set that are not in test
  i<-0
  for(feature in testmatrix@Dimnames[2][[1]]){
    i<-i+1
    if(!(feature %in% trainmatrix@Dimnames[2][[1]])){
      removerows<-c(removerows, i)
    }
  }
  testmatrix<-testmatrix[,-removerows]

# and vice versa...

Solution

  • To me it looks like all you want to do is keep the columns of testmatrix that also appear in trainmatrix and vice versa. Since you want apply this to both matrices, a quick way would be to use intersect on the vectors of colnames from each matrix to find intersecting colnames and then use this to subset:

    #  keep will be a vector of colnames that appear in BOTH train and test matrices
    keep <- intersect( colnames(test) , colnames(train) )
    
    #  Then subset on this vector
    testmatrix <- testmatrix[ , keep ]
    trainmatrix <- trainmatrix[ , keep ]