Context: I have a list of keywords that sometimes consist of one word (e.g. poisson, normal, ...) and sometimes consist of two words, which are then within single quotes ('Two-way ANOVA', 'Generalized linear model', ...). All keywords are separated by white spaces in a single string.
Question: How can extract each keyword of the list, accounting for the ones that are within single quotes ?
Example:
What I have:
kw <- "poisson normal 'negative binomial' log-likelihood"
What I want:
c("poisson", "normal", "negative binomial", "log-likelihood")
We could use a regex find all trick here and match on the following pattern:
'.*?'|\S+
This will eagerly try to find a singly-quoted term, and that failing will fallback to matching any other non quoted term.
library(stringr)
kw <- "poisson normal 'negative binomial' log-likelihood"
output <- str_extract_all(kw, "'.*?'|\\S+")
output
[[1]]
[1] "poisson" "normal" "'negative binomial'"
[4] "log-likelihood"