For a research project I am working on, I have read pdf documents into R, created a corpus and a TermDocumentMatrix. I want to check the frequency of specific words in each document in my corpus. The code below gives me the kind of matrix I want, with the frequency of words by document, but obviously it only does high frequency terms not specific terms.
ft <- findFreqTerms(tdm, lowfreq = 100, highfreq = Inf)
as.matrix(opinions.tdm[ft,])
I found the code below in another comment, which allows for searching the frequency of specific terms, however, it sums across the documents. How do I adapt this so that I am searching for the specific terms but within each document rather than across?
library(tm)
data("crude")
crude <- as.VCorpus(crude)
crude <- tm_map(crude, stripWhitespace)
crude <- tm_map(crude, removePunctuation)
crude <- tm_map(crude, content_transformer(tolower))
crude <- tm_map(crude, removeWords, stopwords("english"))
tdm <- TermDocumentMatrix(crude)
# turn tdm into dense matrix and create frequency vector.
freq <- rowSums(as.matrix(tdm))
freq["crude"]
crude
21
freq["oil"]
oil
85
Skip the rowSums
part and just refer to the matrix
term_matrix <-as.matrix(tdm)
term_matrix["crude",]
# 127 144 191 194 211 236 237 242 246 248 273 349 352 353 368 489
# 2 0 2 3 0 2 0 0 0 0 5 2 0 2 0 0
# 502 543 704 708
# 0 2 0 1
term_matrix["oil",]
# 127 144 191 194 211 236 237 242 246 248 273 349 352 353 368 489
# 5 12 2 1 1 7 3 3 5 9 5 4 5 4 3 4
# 502 543 704 708
# 5 3 3 1