Search code examples
phpregexpreg-matchpreg-match-all

Regex to match previous names


I have been having some trouble writing regex to match previous names on this page: http://steamcommunity.com/id/TripleThreat/namehistory

To be clear, I want in an array the following:

  • TripleThreat
  • [FD] TripleThreat.blyat
  • 9

and so on..

I have already tried writing the Regex but it was a disaster (Something I struggle with)

Here's what I wrote:

$page = file_get_contents(sprintf("http://steamcommunity.com/id/TripleThreat/namehistory"));

preg_match_all("/<span class=\"historyDash\">-<\/span>((.|\n)*)<\/div>/", $page, $matches);

foreach($matches[0] as $match) {
    echo($match . "<br/>");
}

Any help is much appreciated :)


Solution

  • You can try the following regex (the match is in the first capturing group):

    "/<span class=\"historyDash\">-<\/span>\s*((?:[^\<]|\n)*?)\s*<\/div>/"
    

    See it on Regex101.

    The changes I made: trimmed whitespace before and after with the \s*, changed the . to [^\<] to choose only the ones that aren't tag (i.e., the correct text).


    Note: As @PedroLobito pointed out, don't parse HTML with regex unless necessary. Use a library to parse the DOM instead when you can. I just provided an easy example to extend your work, but it might not be the best solution.