Search code examples
rtext-miningquanteda

Quanteda: Document Feature Matrix with predefined set of features


I am using quanteda to build two document feature matrices:

library(quanteda)
DFM1 <- dfm("this is a rock")
#        features
# docs    this is a rock
#   text1    1  1 1    1
DFM2 <- dfm("this is music")
#        features
# docs    this is music
#   text1    1  1     1

However, I want DFM2 to have a specific set of features, namely the ones from DFM1:

DFM2 <- dfm("this is music", *magicargument* = featnames(DFM1))
#        features
# docs    this is a rock
#   text1    1  1 0    0

Is there a magicargument that I am missing? Or is there another efficient way to archieve it for large bags of words?


Solution

  • The magic argument is pattern, where you supply a dfm whose features will be matched (including zeroes for features not in the target dfm):

    dfm_select(DFM2, pattern = DFM1)
    # Document-feature matrix of: 1 document, 4 features (50% sparse).
    # 1 x 4 sparse Matrix of class "dfmSparse"
    #        features
    # docs    this is a rock
    #   text1    1  1 0    0