I am trying to remove single and double char tokens.
here is an example:
toks <- tokens(c("This is a sentence. This is a second sentence."), remove_punct = TRUE)
toks <- tokens_select(toks, min_nchar=1L, max_nchar=2L, selection = "remove")
toks
Results:
tokens from 1 document. text1 :
[1] "is" "a" "is" "a"
I expect to get the tokens that do not meet the criteria, instead of the ones that meet.
It looks like the selection argument is ignored.
This gives the results I wanted.
toks <- tokens_select(toks, min_nchar=3L, max_nchar=79L)