I created a list of bigrams using:
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
tdm_a.bigram = TermDocumentMatrix(docs_a,
control = list(tokenize = BigramTokenizer))
I am trying to get a count of documents each bigram is appearing in. If I understand correctly Term Document Matrix will give how many times each bigram occurs within a document. But I just need '1'-present in a document and '0'-not there.
How do I convert Term Document Matrix into dataframe or matrix to be able to get such count?
A TDM is a simple_triplet_matrix from the slam
package. Which has some fucntions for common operations line row/colSums.
slam::row_sums(tdm_a.bigram>=1)
This should tell you how many documents contained each bigram.