I have spent three days banging my head on how to find a single solution to match anything between either single or double quotes with escaped single or doublequotes within actual source string and to replace matching text .. and I think that I have succeeded. Multi-line or single-line - it works. That is, this regex can be used to alter/change/sanitize 'text'
or "text"
or strings in other words, in any source code *(eg: file_get_contents ('some_class.php')
) and to leave everything else untouched, assuming that code comments are already removed before such action.
Here is regex wrapped in singlequotes ..
'@"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'@msu'
.. and here is regex wrapped within doublequotes.
"@\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'@msu"
It is perfeclty matching with source code like this ...
// Very nasty php array
$Damn = [
'a' => "' lorem ipsum '",
'b' => '"\" ipsu\'m lorem ',
'c' => " \' YabadabaDooya \" ",
'd\"' => '
f"
o\'"o
\'bar" ',
'e' => "'",
"f" => '"'
];
Since this is working as I expect, and I am actually not a PCRE guru (don't ask how much 'pain' I've had in the past three days D: until I came up to this solution), if there's anyone who knows how, and is willing to help by shrinking the above regex into more elegant/shorter solution, that would be superb. I assume that |
(or) in the middle of the pattern can be placed onto beginning, just once .. and I tried God only knows what .. to accomplish that, but no luck.
So, the general question is - how would shorter variant of the above pattern look alike ?
For Spooky, try this Multi-Delimiter Common Core approach
which is mostly your regex.
<<<PCRE
(["'`])((?:\\.|(?!\1|\\).)*)(\1)
PCRE;
https://regex101.com/r/LLWa6L/1
<<<PCRE_EXPLAINED
( ["'`] ) # (1), The delimiters
( # (2 start)
(?:
\\ . # Escape anything
| # or,
(?! \1 | \\ ) # Not a delimiter nor an escape
. # Any character
)*
) # (2 end)
( \1 ) # (3), Backreference to the delimiter
PCRE_EXPLAINED;