While trying to learn and use Goutte to scrape websites for descriptions, it does retrieve text but removes all tags (i.e. <br><b>
).
Is there a way to retrieve the values of all text within the div, including html tags?
Or is there an easier alternative way that does give me this ability?
<?php
require_once "vendor/autoload.php";
use Goutte\Client;
// Init. new client
$client = new Client();
$crawler = $client->request('GET', "examplesite.com/example");
// Crawl response
$description = $crawler->filter('element.class')->extract('_text');
?>
You can use the html()
frunction
http://api.symfony.com/4.0/Symfony/Component/DomCrawler/Crawler.html#method_html
Like this
$descriptions = $crawler->filter('element.class')->each(function($node) {
return $node->html();
})
After you can use strip_tags
PHP function to clean it up