Search code examples
phpregexreplacepreg-replace

How to replace http for https in string but only on specific domains in PHP


I need to parse strings of html content and where possible replace urls to images on other domains with https wherever they are http. The issue is that not all the external domains support https so I can't blanket replace http for https.

So I want to do this with a list of domains I know work with https.

There is the small added complication that the search has to work for domains irrelevant if www. is added or not.

Using the example given by @Wiktor I have something close to what I want, but this needs reversing to run the replace when a match is found, not when a match isn't found as this code currently functions.

/http(?!:\/\/(?:[^\/]+\.)?(?:example\.com|main\.com)\b)/i

Solution

  • I believe you can use

    $domains = array("example.com", "main.com");
    $s = "http://example.com http://main.main.com http://let.com";
    $re = '/http(?=:\/\/(?:[^\/]+\.)?(?:' 
          . implode("|", array_map(function ($x) {
                 return preg_quote($x); 
              }, $domains)) 
          . ')\b)/i'; 
    echo preg_replace($re, "https", $s);
    // => https://example.com https://main.main.com http://let.com
    

    See the IDEONE demo

    The regex matches:

    • http - http only if followed by...
    • (?= - start of positive lookahead
      • :\/\/ - a :// literal substring
      • (?:[^\/]+\.)? - an optional sequence of 1+ chars other than / and a .
      • (?: + implode code - creates an alternation group escaping individual literal branches (to match any one of the alternatives, example or main, etc.)
      • ) - end of the alternation group
    • \b - word boundary
    • ) - end of the lookahead
    • /i - case insenstive modifier.