Get term frequencies within categories in R dictionary

I have a dictionary with multiple subcategories and I would like to find the most frequent words and bigrams within each subcategory using R.

I am using a large dataset but here's an example of what I have looks like:

s <-  "Day after day, day after day,
We stuck, nor breath nor motion;"

library(stringi)
x <- stri_replace_all(s, "", regex="<.*?>") 
x <- stri_trim(s)
x <- stri_trans_tolower(s) 

library(quanteda)
toks <- tokens(x) 
toks <- tokens_wordstem(toks) 

dtm <- dfm(toks, 
       tolower=TRUE, stem=TRUE,
       remove=stopwords("english"))

dict1 <- dictionary(list(a=c("day*", "week*", "month*"),
                    b=c("breath*","motion*")))

dict_dtm2 <- dfm_lookup(dtm, dict1, nomatch="_unmatched")                                 
tail(dict_dtm2)

This gives me the total frequencies per subcategory but not the frequency of each individual word within these subcategories. The results I am looking for would look something like this:

words(a)   freq
day         4
week        0
month       0

words(b)   freq
breath     1
motion     1

I would appreciate any help with that!

Solution

As far as I understand your question, I believe you are in the look for the table() command. You need to work a little bit of regular expressions to treat the first sentence, but I believe you can do it. An idea can be as following:

s <-  "day after day day after day We stuck nor breath nor motion"
s <- strsplit(s, "\\s+")

dict <- list(a<- c("day", "week", "month"),
                        b<-c("breath","motion"))
lapply(dict, function(x){
                Wordsinvect<-intersect(unlist(x),unlist(s))
                return(table(s)[Wordsinvect])}
)


# [[1]]
# day 
# 4 
# 
# [[2]]
# s
# breath motion 
# 1      1

I hope it helps. Cheers !