In the corpus "tkn_pb" , I would like to delete all words except for some keywords I chose (ex. "attack" and "gunman"). Is it possicle to do this?
You can use which
and grepl
to subset your corpus:
Data:
sample_tokens <- c("word", "another","a", "new", "word token", "one", "more", "and", "another one")
Remove all words except "a" and "and":
sample_tokens[which(grepl("\\b(a|and)\\b", sample_tokens))]
[1] "a" "and"
EDIT:
If the corpus is a list, then this solution suggested by @John would work:
Data:
sample_tokens <- list(c("word", "another","a", "new", "word token", "one", "more", "and", "another one"),
c("yet", "a", "few", "more", "words"),
c("and", "so on"))
lapply(sample_tokens, function(x) x[which(grepl("\\b(a|and)\\b", x))])
[[1]]
[1] "a" "and"
[[2]]
[1] "a"
[[3]]
[1] "and"