Search code examples
phphtmlregexhtml-parsingtext-extraction

Get text from all <a> tags in string


Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.

<a href="/folder/files/hey/">hey.exe</a>
<a href="/folder/files/hey2/">hey2.dll</a>
<a href="/folder/files/pomp/">pomp.jpg</a>

In PHP I need to extract what's between the <a> tags example:

hey.exe
hey2.dll
pomp.jpg

Solution

  • Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:

    '/<a[^>]+>([^<]+)<\/a>/i'
    

    Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:

    preg_match_all($pattern, $string, $matches);
    // matches get stored in '$matches' variable as an array
    // matches in between the <a></a> tags will be in $matches[1]
    print_r($matches);