Search code examples
phpregexpcre

More elegant (shorter) solution for this regex pattern


I have spent three days banging my head on how to find a single solution to match anything between either single or double quotes with escaped single or doublequotes within actual source string and to replace matching text .. and I think that I have succeeded. Multi-line or single-line - it works. That is, this regex can be used to alter/change/sanitize 'text' or "text" or strings in other words, in any source code *(eg: file_get_contents ('some_class.php')) and to leave everything else untouched, assuming that code comments are already removed before such action.

Here is regex wrapped in singlequotes ..

'@"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'@msu'

.. and here is regex wrapped within doublequotes.

"@\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'@msu"

It is perfeclty matching with source code like this ...

// Very nasty php array 

$Damn = [

  'a' => "' lorem ipsum '",

  'b' => '"\" ipsu\'m lorem  ',

  'c' => " \' YabadabaDooya \" ",

  'd\"' => ' 

     f"

     o\'"o  

                 \'bar" ',

  'e' => "'",

  "f" => '"'

];

Since this is working as I expect, and I am actually not a PCRE guru (don't ask how much 'pain' I've had in the past three days D: until I came up to this solution), if there's anyone who knows how, and is willing to help by shrinking the above regex into more elegant/shorter solution, that would be superb. I assume that | (or) in the middle of the pattern can be placed onto beginning, just once .. and I tried God only knows what .. to accomplish that, but no luck.

So, the general question is - how would shorter variant of the above pattern look alike ?


Solution

  • For Spooky, try this Multi-Delimiter Common Core approach
    which is mostly your regex.

    <<<PCRE
    
        (["'`])((?:\\.|(?!\1|\\).)*)(\1)
    
    PCRE;
    

    https://regex101.com/r/LLWa6L/1

    <<<PCRE_EXPLAINED
    
         ( ["'`] )              # (1), The delimiters
         (                      # (2 start)
            (?:
               \\ .                   # Escape anything
             |                       # or,
               (?! \1 | \\ )          # Not a delimiter nor an escape
               .                      # Any character
            )*
         )                      # (2 end)
         ( \1 )                 # (3), Backreference to the delimiter
    
    PCRE_EXPLAINED;