Search code examples
phpparsingcurlweb-scrapingscreen-scraping

Getting the HTML from a link hyperlink scraped from a website


I am currently trying to navigate to another webpage, and get its HTML using a hyperlink I scraped. (I need information stored on it).

I'm currently having trouble getting the PHP curl function to grab the HTML code using the link I generate.

The portion of code I'm trying to use to build/get the HTML code is:

foreach($rows as $row)
{
    //creating the link itself. https://pr.mo.gov/ is the website itself, the attribute that is returend is the direction location.
    // /pharmacy-licensee-search-detail.asp?passkey=1285356, us an example of what I get from getArrtibute('href')
    $holder = "https://pr.mo.gov/".$row->getAttribute('href');
    // $holder = https://pr.mo.gov/pharmacy-licensee-search-detail.asp?passkey=1285356 as per the example used in the comments above.
    echo $holder;
    echo "<br>";

    //trying to use curl to get the website html
    $c = curl_init("$holder");
    $html2 = curl_exec($c);
    //Trying to print out what has been recived
    echo var_dump($html2); 
    //IT's printing out bool(false)
    curl_close($c);
}

The code prior to this part works fine--as it gets me the HTML from the original webpage. If it is needed, I will post it.


Solution

  • You need to check the result of the curl_init call.

    Add echo curl_error($c) . "<br>"; after it to see the error. Very likely it's related to SSL certificate. If so, take a look at this question - PHP - SSL certificate error: unable to get local issuer certificate

    If there is no error in curl_init, use curl_error anyway after each curl function call to get the problem explanation.