Search code examples
regexescapingphp-5.4preg-replace-callback

Get a substring up to the current match inside preg_replace_callback


I've simplified this question because it was getting quite long. Basically I want to get a substring of the $subject that goes from the start of $subject up to the current match the callback function is running on. Here is an example of some input (javascript):

$subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";

I'm using a url matching regex in my preg_replace_callback, so it will match http://google.co.uk. I want to get a substring of $subject up to the start of that match: var myUrl = '<a href=" should be contained in the substring. How can I do this?

$subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";
preg_replace_callback("MY URL MATCHING PATTERN", function($matches) {
  // Get length of $subject up to the current match
  $length = ?; // this is the bit I can't work out
  // Get substring
  $before = substr($subject, 0, $length);
  // Work out whether or not to escape the single quotes
  $quotes = array();
  preg_match_all("/'/", $before, $quotes);
  $quotecount = count($quotes);
  $escape = ($quotecount % 2 == 0 ? "" : "\\");
  // Return the binary value
  return "javascript:parent.query(".$escape."'".textToBinary($matches[0]).$escape."')";
}, $subject);

Solution

  • - Firstly, I recommend using DOM functionalities such as PHP DOMDocument or DOMXPath.

    - Secondly, it is better to revise your RegEx. (\S is the culprit)

    - Thirdly, a quick solution to your problem is:

    return "javascript:open('".str_replace("'", "\\'", $matches[0])."')";
    

    Updated:

    $subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";
    
    $pattern = "@(https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@";
    $result = preg_replace_callback($pattern, function($matches) use ($subject) {
      $pos = strpos($subject, $matches[0]);
      $str = substr($subject, 0, $pos);
      $escape = (strpos($str, "'") == false) ? "'" : "\\'";
      return "javascript:parent.query({$escape}".textToBinary($matches[0])."{$escape})";
    }, $subject);