Search code examples
rtmtopic-modeling

How to change the values of a DocumentTermMatrix matrix?


Repex:

Suppose I have the dtm:

library(topicmodels)
data(AssociatedPress)

I am trying to assign a value of .001 to all those values that are 0

Use case:

I get this error when I run LDA on my matrix

Error in LDA(notSparse, k, method = "Gibbs", control = list(nstart = nstart, : Each row of the input matrix needs to contain at least one non-zero entry

I would like to see what happens if I turn the zeros into small values to reduce sparsity instead of using the dedicated function.


Solution

  • Changing your matrix values from 0 to 0.001 will not work with topicmodels::LDA. There is a check in the code that expects all values to be integer values. Which means values of 0.001 are not allowed. See example below:

    m_replaced_zero <- matrix(c(1, 1, 0.001, 0), nrow = 2)
    LDA(m_replaced_zero)
    Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type
    

    Weirdly the error you get means that you have a row in your matrix that only contains 0. Which shouldn't happen unless you removed some terms from your documenttermmatrix which resulted in a row where there is no value of 1 or more. See example below.

    m_zero_row <- matrix(c(1, 0, 1, 0), nrow = 2)
         [,1] [,2]
    [1,]    1    1
    [2,]    0    0
    LDA(m_zero_row)
    Error in LDA(m_zero_row) : 
      Each row of the input matrix needs to contain at least one non-zero entry
    

    But if you are set on replacing the sparse entries in a documenttermmatrix, you first need to transform it into a matrix and then replace the 0's.

    data("AssociatedPress")
    m <- as.matrix(AssociatedPress)
    m[m==0] <- 0.001