Search code examples
rquanteda

how to create interactions with quanteda?


Consider the following example

library(quanteda)
library(tidyverse)

tibble(text = c('the dog is growing tall',
                'the grass is growing as well')) %>% 
  corpus() %>% dfm()
Document-feature matrix of: 2 documents, 8 features (31.2% sparse).
       features
docs    the dog is growing tall grass as well
  text1   1   1  1       1    1     0  0    0
  text2   1   0  1       1    0     1  1    1

I would like to create an interaction between dog and the other tokens in each sentence. That is, creating the features the-dog, is-dog, growing-dog, tall-dog and adding them to the dfm (on top of the ones we already have).

That is, for instance, the-dog would be equal to 1 if both the and dog are present in the sentence (and zero otherwise). So the-dog would be one for the first sentence and zero for the second one.

Notice how I only create interaction terms when dog is in the sentence, so dog-grass is not required here.

How can I do that efficiently in quanteda?


Solution

  • library("quanteda")
    ## Package version: 2.1.2
    
    toks <- tokens(c(
      "the dog is growing tall",
      "the grass is growing as well"
    ))
    
    # now keep just tokens co-occurring with "dog"
    toks_dog <- tokens_select(toks, "dog", window = 1e5)
    
    # create the dfm and label other terms as interactions with dog
    dfmat_dog <- dfm(toks_dog) %>%
      dfm_remove("dog")
    colnames(dfmat_dog) <- paste(featnames(dfmat_dog), "dog", sep = "-")
    dfmat_dog
    ## Document-feature matrix of: 2 documents, 4 features (50.00% sparse) and 0 docvars.
    ##        features
    ## docs    the-dog is-dog growing-dog tall-dog
    ##   text1       1      1           1        1
    ##   text2       0      0           0        0
    
    # combine with other features
    print(cbind(dfm(toks), dfmat_dog), max_nfeat = -1)
    ## Document-feature matrix of: 2 documents, 12 features (37.50% sparse) and 0 docvars.
    ##        features
    ## docs    the dog is growing tall grass as well the-dog is-dog growing-dog
    ##   text1   1   1  1       1    1     0  0    0       1      1           1
    ##   text2   1   0  1       1    0     1  1    1       0      0           0
    ##        features
    ## docs    tall-dog
    ##   text1        1
    ##   text2        0
    

    Created on 2021-03-18 by the reprex package (v1.0.0)