I encountered this question: PHP explode the string, but treat words in quotes as a single word
and similar dealing with using Regex to explode words in a sentence, separated by a space, but keeping quoted text intact (as a single word).
I would like to do the same in R. I have attempted to copy-paste the regular expression into stri_split
in the stringi package as well as strsplit
in base R, but as I suspect the regular expression uses a format R does not recognize. The error is:
Error: '\S' is an unrecognized escape in character string...
The desired output would be:
mystr <- '"preceded by itself in quotation marks forms a complete sentence" preceded by itself in quotation marks forms a complete sentence'
myfoo(mystr)
[1] "preceded by itself in quotation marks forms a complete sentence" "preceded" "by" "itself" "in" "quotation" "marks" "forms" "a" "complete" "sentence"
Trying: strsplit(mystr, '/"(?:\\\\.|(?!").)*%22|\\S+/')
gives:
Error in strsplit(mystr, "/\"(?:\\\\.|(?!\").)*%22|\\S+/") :
invalid regular expression '/"(?:\\.|(?!").)*%22|\S+/', reason 'Invalid regexp'
A simple option would be to use scan
:
> x <- scan(what = "", text = mystr)
Read 11 items
> x
[1] "preceded by itself in quotation marks forms a complete sentence"
[2] "preceded"
[3] "by"
[4] "itself"
[5] "in"
[6] "quotation"
[7] "marks"
[8] "forms"
[9] "a"
[10] "complete"
[11] "sentence"