Search code examples
phphtmlregexfile-get-contents

PHP: file_get_contents from a page using regex h2 tag that is inside a div


This used to work fine to get me a text from a certain web page that exists in a div tag when a user type the id below:

function get_text($id) {
  $result = file_get_contents('www.site.net/.$id.'');
  $regex = '/<div class="x">([^<]*)<\/div>/';
  if (preg_match($regex, $result, $matches) && !empty($matches[1])) {   
    return $matches[1]; 
  } else {
    return 'N/A';
  }
}

Now the text is more difficult to get, because it's situated here:

 <div class="X2">
   <h2 style="font-family: 'Pacifico', cursive;">TEXT</h2>
 </div>

I tried both div and h2 but it returns me nothing, please help ! thank you.


Solution

  • This is quite easily solved using PHP's DOMDocument:

    $html = <<<'EOT'
    <div class="X2">
     <h2 style="font-family: 'Pacifico', cursive;">TEXT</h2>
     </div>
    EOT;
    
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $xpath = new DOMXPath($doc);
    $div = $xpath->query('//div[contains(@class, "X2")]')->item(0);
    echo $div->textContent;
    

    Output:

    TEXT
    

    Demo on 3v4l.org

    To fit into your function environment, this should work:

    function get_text($id) {
        $html = file_get_contents("www.site.net/$id");
        $doc = new DOMDocument();
        $doc->loadHTML($html);
        $xpath = new DOMXPath($doc);
        $div = $xpath->query('//div[contains(@class, "X2")]');
        if (count($div)) {
            return $div->item(0)->textContent;
        }
        else {
            return 'N/A';
        }
    }