Search code examples
phphtmlregexhrefexplode

Explode and/or regex text to HTML link in PHP


I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP

"text1(text1)":http://www.example.com/mypage

Notes:

  • text1 is always identical to the text in parenthesis

  • The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.

  • Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.

  • I need to turn these into basic links, like

<a href="http://www.example.com/mypage">text1</a>

How do I do this? Do I need explode or regex or both?


Solution

  • You can use this replacement:

    $pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~'; 
    
    $replacement = '<a href="\2">\1</a>'; 
    
    $result = preg_replace($pattern, $replacement, $text);
    

    pattern details:

    ([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:

    • it allows to use a greedy quantifier, that is faster
    • since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)

    \S+ means all that is not a whitespace one or more times

    (?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.