https://github.com/paquettg/php-html-parser Anybody knows how to to follow redirects in this library? For example:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->loadFromUrl($html);
Versions:
Why does the library not natively allow redirects?
The loadFromUrl
method has the following signature (at the time is 3.1.1)
public function loadFromUrl(string $url, ?Options $options = null, ?ClientInterface $client = null, ?RequestInterface $request = null): Dom
{
if ($client === null) {
$client = new Client();
}
if ($request === null) {
$request = new Request('GET', $url);
}
$response = $client->sendRequest($request);
$content = $response->getBody()->getContents();
return $this->loadStr($content, $options);
}
Looking at the line $response = $client->sendRequest($request);
it goes to Guzzle's Client - https://github.com/guzzle/guzzle/blob/master/src/Client.php#L131
/**
* The HttpClient PSR (PSR-18) specify this method.
*
* @inheritDoc
*/
public function sendRequest(RequestInterface $request): ResponseInterface
{
$options[RequestOptions::SYNCHRONOUS] = true;
$options[RequestOptions::ALLOW_REDIRECTS] = false;
$options[RequestOptions::HTTP_ERRORS] = false;
return $this->sendAsync($request, $options)->wait();
}
The $options[RequestOptions::ALLOW_REDIRECTS] = false;
will automatically turn off redirects. No matter what you pass in with the Client or Request it will automatically turn off redirects.
How to follow redirects with the library
Observing that the method loadFromUrl
will make the request and get the response then use loadStr
we'll mimic the same but use Guzzle (as it's a dependency of the library).
<?php
// Include the autoloader
use GuzzleHttp\Client;
use GuzzleHttp\Exception\GuzzleException;
use PHPHtmlParser\Dom;
include_once("vendor/autoload.php");
$client = new Client();
try {
// Showing the allow_redirects for verbosity sake. This is on by default with GuzzleHTTP clients.
$request = $client->request('GET', 'http://theeasyapi.com', ['allow_redirects' => true]);
// This would work exactly the same
//$request = $client->request('GET', 'http://theeasyapi.com');
} catch(GuzzleException $e) {
// Probably do something with $e
var_dump($e->getMessage());
exit;
}
$dom = new Dom();
$domExample = $dom->loadStr($request->getBody()->getContents());
foreach($domExample->find('a') as $link) {
var_dump($link->text);
}
The code above will instantiate a new Guzzle Client, and make a request to the URL allowing redirects. The website used in this example is a site that will 301 redirect from non-secure to secure.