Search code examples
phpsymfonydomcrawler

Symfony's DomCrawler does not find a specific tag


I'm using DomCrawler to get data from a Google Play page and it works in 99% of cases, except I stumbled upon a page where it can not find a specific div. I check the HTML code and it is definitely there. My code is

$autoloader = require __DIR__.'\vendor\autoload.php';
use Symfony\Component\DomCrawler\Crawler;

$app_id = 'com.balintinfotech.sinhalesekeyboardfree';

$response = file_get_contents('https://play.google.com/store/apps/details?id='.$app_id);
$crawler = new Crawler($response);
echo $crawler->filter('div[itemprop="datePublished"]')->text();

When I run that specific page I get

PHP Fatal error: Uncaught InvalidArgumentException: The current node list is empty.

However, if I use any other ID, I get the desired result. What exactly is about that page that breaks DomCrawler


Solution

  • As you correctly figured out, this doesn't happen in the English version, but it does in the Spanish one.

    One difference I could spot was a comment by a user saying නියමයි ඈ. There seems to be something bothering the Crawler there. If you replace a null characted (\x00) by an empty string, it correctly gets what you're looking for:

    <?php
    $app_id = 'com.balintinfotech.sinhalesekeyboardfree';
    $response = file_get_contents('https://play.google.com/store/apps/details?hl=en&id='.$app_id);
    $response = str_replace("\x00", "", $response);
    $crawler = new Symfony\Component\DomCrawler\Crawler($response);
    var_dump($crawler->filter('div[itemprop="datePublished"]')->text()); // string(14) "March 14, 2017"
    

    I'll try to look more into this.