I'm trying to split a UTF-8 string on a quote character (") with delimiter capture, except where that quote is followed by a second quote ("") so that (for example)
"A ""B"" C" & "D ""E"" F"
will split into three elements
"A ""B"" C"
&
"D ""E"" F"
I've been attempting to use:
$string = '"A ""B"" C" & "D ""E"" F"';
$temp = preg_split(
'/"[^"]/mui',
$string,
null,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
);
but without success as it gives me
array(7) {
[0]=>
string(2) " ""
[1]=>
string(1) """
[2]=>
string(1) "C"
[3]=>
string(2) "& "
[4]=>
string(2) " ""
[5]=>
string(1) """
[6]=>
string(2) "F""
}
So it's losing any characters that immediately follow a quote unless that character is also a quote
In this example there's a quote as the first and last characters in the string, though that may not always be the case, e.g.
{ "A ""B"" C" & "D ""E"" F" }
needs to split into five elements
{
"A ""B"" C"
&
"D ""E"" F"
}
Can anybody help me get this working?
Since you said that you don't mind the quotes to be consumed on the split, you can use the expression:
(?<!")\s?"\s?(?!")
Where two negative lookarounds are used. The output on your sample will be:
{
A ""B"" C
&
D ""E"" F
}
[I put the \s?
to consume any trailing space, remove them if you want to keep them]