Search code examples
regexpreg-match-all

Regex - Ignore some parts of string in match


Here's my string:

address='St Marks Church',notes='The North East\'s premier...'

The regex I'm using to grab the various parts using match_all is

'/(address|notes)='(.+?)'/i'

The results are:

address => St Marks Church
notes => The North East\

How can I get it to ignore the \' character for the notes?


Solution

  • Because you have posted that you are using match_all and the top tags in your profile are php and wordpress, I think it is fair to assume you are using preg_match_all() with php.

    The following patterns will match the substrings required to buildyour desired associative array:

    Patterns that generate a fullstring match and 1 capture group:

    1. /(address|notes)='\K(?:\\\'|[^'])*/ (166 steps, demo link)
    2. /(address|notes)='\K.*?(?=(?<!\\)')/ (218 steps, demo link)

    Patterns that generate 2 capture groups:

    1. /(address|notes)='((?:\\\'|[^'])*)/ (168 steps, demo link)
    2. /(address|notes)='(.*?(?<!\\))'/ (209 steps, demo link)

    Code: (Demo)

    $string = "address='St Marks Church',notes='The North East\'s premier...'";
    
    preg_match_all(
        "/(address|notes)='\K(?:\\\'|[^'])*/",
        $string,
        $out
    );
    var_export(array_combine($out[1], $out[0]));
    
    echo "\n---\n";
    
    preg_match_all(
        "/(address|notes)='((?:\\\'|[^'])*)/",
        $string,
        $out,
        PREG_SET_ORDER
    );
    var_export(array_column($out, 2, 1));
    

    Output:

    array (
      'address' => 'St Marks Church',
      'notes' => 'The North East\\\'s premier...',
    )
    ---
    array (
      'address' => 'St Marks Church',
      'notes' => 'The North East\\\'s premier...',
    )
    

    Patterns #1 and #3 use alternatives to allow non-apostrophe characters or apostrophes not preceded by a backslash.

    Patterns #2 and #4 (will require an additional backslash when implemented with php demo) use lookarounds to ensure that apostrophes preceded by a backslash don't end the match.

    Some notes:

    • Using capture groups, alternatives, and lookarounds often costs pattern efficiency. Limiting the use of these components often improves performance. Using negated character classes with greedy quantifiers often improves performance.

    • Using \K (which restarts the fullstring match) is useful when trying to reduce capture groups and it reduces the size of the output array.