Search code examples
phpurlpreg-matchspecial-characterspreg-match-all

Extract URL's from a string using PHP if any of two special symbols in the URL should be treated as delimiters (the first characters followed by URL)?


To extract URLs (not a perfect solution but I'm almost satisfied as performance counts) I use

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $string, $match);

code to extract URLs. However, it's not a perfect solution for me as URLs should be forced to cut up to ] or "|" if any of these two symbols met in the extracted URL.

I know these symbols are valid symbols in URLs, however for my case they should be invalid. How should the preg_match_all above be slightly modified to know about these two delimiters? Thank you.


Solution

  • [:punct:] is a short for [!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_``{|}~].

    In your regex you are using a negated character class [^,[:punct:]\s] that could be written as: [^!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_``{|}~\s] (I've removed the first comma because it already exists and I've dupplicate backquote for highlight).

    If you want to allow ] and |, remove them from the character class:

    [^!"\#$%&'()*+,\-./:;<=>?@\[\\^_`{}~\s]