How would I approach writing a regex where given a set of delimiters such as both ;
and ,
, I could get the following results on these examples:
coffee, water; tea -> [coffee, water, tea]
"coffee, black;", water; tea -> ["coffee, black;", water, tea]
To clarify, regular text cannot have spaces, quoted text can have spaces, delimiters inside the quotes are ignored, and all text is separated by delimiters.
I've been experimenting with regex myself, and haven't gotten the results that I want. I'm also working in an environment without lookaheads/lookbehinds. Any thoughts on how to achieve this?
Here is a good way (?:\r?\n|[,;]|^)[^\S\r\n]*((?:(?:[^\S\r\n]*[^,;"\s])*(?:"[^"]*")?[^,;"\s]*))[^\S\r\n]*
Added some WSp trim to it.
Nice demo here -> https://regex101.com/r/FsJtOE/1
Capture group 1 contains the element.
A simple find all should work.
Note, using Re2 has no assertions, but to handle all corners
it really needs them.
Unfortunately, this is as close as you can get using that regex engine.
One thing this will do is allow multiple words in non-quoted fields.
Readable version
# Validate even quotes: ^[^"]*(?:"[^"]*"[^"]*)*$
# Then ->
# ----------------------------------------------
# Find all:
(?: \r? \n | [,;] | ^ )
[^\S\r\n]*
( # (1 start)
(?:
(?:
[^\S\r\n]*
[^,;"\s]
)*
(?: " [^"]* " )?
[^,;"\s]*
)
) # (1 end)
[^\S\r\n]*