Search code examples
rsparse-matrix

Replace values in a dfm sparse matrix


I'm using the quanteda package to produce a sparse matrix of word frequency counts. I want to make a change so that the output is bonary so simply 1 or 0, is the word present or not but i'm not sure how to do this with a sparse matrix.

install.packages(quanteda)

Example matrix

trainingset <- as.dfm(matrix(c(1, 2, 0, 0, 0, 0,
                    0, 2, 0, 0, 1, 0,
                    0, 1, 0, 1, 0, 0,
                    0, 1, 1, 0, 0, 1,
                    0, 3, 1, 0, 0, 1), 
                  ncol=6, nrow=5, byrow=TRUE,
                  dimnames = list(docs = paste("d", 1:5, sep = ""),
                                  features = c("Beijing", "Chinese",  "Japan", "Macao", 
                                               "Shanghai", "Tokyo"))))

Solution

  • If you have a look at str(trainingset) you can see the slots of the matrix. As with sparse matrices the x slot holds the data, so you can change this to binary using

    trainingset@x <- as.numeric(trainingset@x > 0)
    
    Document-feature matrix of: 5 documents, 6 features (60% sparse).
    5 x 6 sparse Matrix of class "dfmSparse"
        features
    docs Beijing Chinese Japan Macao Shanghai Tokyo
      d1       1       1     0     0        0     0
      d2       0       1     0     0        1     0
      d3       0       1     0     1        0     0
      d4       0       1     1     0        0     1
      d5       0       1     1     0        0     1