Keeping punctuation in R Document Term Matrix

I'm trying to make a DocumentTermMatrix in R, using the parameter control = list() to limit the terms to a pre-defined list of text-based emojis (:D, :), :(, etc.). However, dtm doesn't pick up certain emojis (like ":D" or ":)"), but some other works fine (":))") . My code:

text = c(":D", ":))" ) 
corpus <- Corpus(VectorSource(text)
corpus = tm_map(corpus, PlainTextDocument)
dtm = DocumentTermMatrix(corpus, list(dictionary = c(":D" , ":))" )))
emojidf <- as.data.frame(as.matrix(dtm))

  :D :))
1  0   0
2  0   1

To fix this, I could use content_transformer and gsub to change the problematic emojis to words. However, I'd like to know how DocumentTermMatrix or even Corpus treat punctuation as words.

Solution

Two issues (see ?DocumentTermMatrix and ?termFreq): The wordLengths filter by default demands a minimum word length of 3 characters. And tolower by default turns :D into :d. So try:

library(tm)
text <- c(":D", ":))" ) 
corpus <- Corpus(VectorSource(text))
dtm <- DocumentTermMatrix(
  corpus, 
  control = list(
    dictionary = c(":D" , ":))"), 
    wordLengths=c(-Inf,Inf), 
    tolower=FALSE
  )
)
as.matrix(dtm)
#     Terms
# Docs :)) :D
#    1   0  1
#    2   1  0