Search code examples
text-miningquanteda

Quanteda merging unigrams and bigrams


I want to experiment if having both unigrams and bigrams in one DFM improves my document classification. I would like to create both unigrams and bigrams in one DFM. From there, I can then get my TF-IDF weighted DFM considering both unigrams and bigrams. Possibly, I can possibly create unigram and bigram dfms separately and then I can merge them. But, I would like to know if quanteda has a more efficient way of doing this. I appreciate your responses.


Solution

  • Got it from the quanteda page. It works with something like this.

    toks_skip <- tokens_ngrams(toks, n = 1:2)