kwic in quanteda (R) does not identify more than one word in regex pattern

I am trying to identify regex patterns in text, but kwic() does not identify regex phrases that are longer than just one word. I tried to use phrase(), but that did not work either.

To give you an example:

mycorpus = corpus(bla$`TEXT` )
foo = kwic(mycorpus, pattern = "\\bno\\b", window = 10, valuetype = "regex" ) #gives 1959 obs. 
foo = kwic(mycorpus, pattern = "\\bno\\b\\s{0,5}\\w+", window = 10, valuetype = "regex" ) #gives 0 obs.
foo = kwic(mycorpus, pattern = "no\\sother", window = 10, valuetype = "regex" ) #gives 0 obs. even though it should find 3 phrases

even though there are multiple patterns in the text that should be identified.

Thanks for the help!

Solution

That's because kwic searches tokens, and tokens no longer contain spaces. To search for a sequence of tokens, what quanteda treats as a "phrase", wrap the pattern in phrase(). (See also ?phrase.)

library("quanteda")
## Package version: 2.0.0

txt <- "one two three four five"

# no match
kwic(txt, "one\\stwo", valuetype = "regex", window = 1)
## kwic object with 0 rows

# match
kwic(txt, phrase("one two"), valuetype = "regex", window = 1)
##                                 
##  [text1, 1:2]  | one two | three