Search code examples
rmatrix-multiplicationlsa

Multiplicating a matrix with a vector results in a matrix


I have a document-term matrix:

document_term_matrix <- as.matrix(DocumentTermMatrix(corpus, control = list(stemming = FALSE, stopwords=FALSE, minWordLength=3, removeNumbers=TRUE, removePunctuation=TRUE )))

For this document-term matrix, I've calculated the local term- and global term weighing as follows:

lw_tf <- lw_tf(document_term_matrix)
gw_idf <- gw_idf(document_term_matrix)

lw_tf is a matrix with the same dimensionality as the document-term-matrix (nxm) and gw_idf is a vector of size n. However, when I run:

tf_idf <- lw_tf * gw_idf

The dimensionality of tf_idf is again nxm.

Originally, I would not expect this multiplication to work, as the dimensionalities are not conformable. However, given this output I now expect the dimensionality of gw_idf to be mxm. Is this indeed the case? And if so: what happened to the gw_idf vector of size n?


Solution

  • Matrix multiplication is done in R by using %*%, not * (the latter is just element-wise multiplication). Your reasoning is partially correct, you were just using the wrong symbols.

    About the matrix multiplication, a matrix multiplication is only possible if the second dimension of the first matrix is the same as the first dimensions of the second matrix. The resulting dimensions is the dim1 of first matrix by the dim2 of the second matrix.

    In your case, you're telling us you have a 1 x n matrix multiplied by a n x m matrix, which should result in a 1 x m matrix. You can check such case in this example:

    a <- matrix(runif(100, 0 , 1), nrow = 1, ncol = 100)
    b <- matrix(runif(100 * 200, 0, 1), nrow = 100, ncol = 200)
    
    c <- a %*% b
    dim(c)
    [1] 1 200
    

    Now, about your specific case, I don't really have this package that makes term-documents (would be nice of you to provide an easily reproducible example!), but if you're multiplying a nxm matrix element-wise (you're using *, like I said in the beginning) by a nx1 array, the result does not make sense. Either your variable gw_idf is not an array at all (maybe it's just a scalar) or you're simply making a wrong conclusion.