I'm trying to turn a character vector novel.lower.mid into a list of single words. So far, this is the code I've used:
midnight.words.l <- strsplit(novel.lower.mid, "\\W")
This produces a list of all the words. However, it splits everything, including contractions. The word "can't" becomes "can" and "t". How do I make sure those words aren't separated, or that the function just ignores the apostrophe?
We can use
library(stringr)
str_extract_all(novel.lower.mid, "\\b[[:alnum:]']+\\b")
Or
strsplit(novel.lower.mid, "(?!')\\W", perl=TRUE)