How to keep special symbols like "(" "," and "#" in tokens in R?

I'm dealing with a text file that has words like "c#", "c++", and ".net" from jobs ads. When I convert it into tokens, the "#" , "++", and the dot are removed. How can I keep them in the resulting tokens? Here is my code:

unnest_tokens(word,REQUIREMENTS, token = "words",to_lower=TRUE)

Solution

The problem is the argument token = "words", which splits on non-word characters (presumably using the regex \\W+). This function throws away the delimiters, so in order to keep those characters, you will have to use some other argument than "words". You might want to define your own splitting regex with token = "regex" and something like this:

unnest_tokens(word,
              REQUIREMENTS,
              token = "regex",
              to_lower = TRUE,
              pattern = "\\s+") # split on whitespace rather than non-word elements

This way, you can define whatever regex you need to customize how the text is tokenized.