Search code examples
rtext-miningtmquanteda

Assigning weights to different features in R


Is it possible to assign weights to different features before formulating a DFM in R?

Consider this example in R

str="apple is better than banana" mydfm=dfm(str, ignoredFeatures = stopwords("english"), verbose = FALSE)

DFM mydfm looks like:

docs apple better banana
text1  1      1     1

But, I want to assign weights(apple:5, banana:3) beforehand, so that DFM mydfm looks like:

docs apple better banana
text1  5      1     3

Solution

  • I don't think so, however you can easily do it afterwards:

    library(quanteda)
    str <- "apple is better than banana"
    mydfm <- dfm(str, ignoredFeatures = stopwords("english"), verbose = FALSE)
    idx <- which(names(weights) %in% colnames(mydfm))
    mydfm[, names(weights)[idx]] <-  mydfm[, names(weights)[idx]] %*% diag(weights[idx])
    mydfm
    # 1 x 3 sparse Matrix of class "dgCMatrix"
    #        features
    # docs    apple better banana
    #   text1     5      1      3