I'm doing text analysis using R. Is there a way to remove all the words not in caps using tm
or stringi
?
If I have something like this
Albert Einstein went to the store and saw his friend Nikola Tesla ... + 200 pags
to be converted into
Albert Einstein Nikola Tesla
Best regards
Just use grep
and a regular expression:
words <- 'Albert Einstein went to the store and saw his friend Nikola Tesla'
# split to vector of individual words
vec <- unlist(strsplit(words, ' '))
# just the capitalized ones
caps <- grep('^[A-Z]', vec, value = T)
# assemble back to a single string, if you want
paste(caps, collapse=' ')