Search code examples
phpjsongoutte

Crawling a website using Laravel & Elvedia\Goutte: How to extract JSON


I managed to access succesfully a remote JSON resource using Goutte Laravel 4:

$client = Goutte::getNewClient();

//*
$crawler = $client->request('GET', 'http://domain.mg/admin');

$form = $crawler->selectButton('Login')->form();
$crawler = $client->submit($form, array('username' => 'username', 'password' => 'password'));

//*/

$crawler = $client->request('GET', 'http://domain.mg/usergroup/list'); // Yields JSON Response

return dd($crawler);

It yields an output like so:

object(Symfony\Component\DomCrawler\Crawler)#285 (4) { ["uri":protected]=> string(36) "http://domain.mg/usergroup/list" ["defaultNamespacePrefix":"Symfony\Component\DomCrawler\Crawler":private]=> string(7) "default" ["namespaces":"Symfony\Component\DomCrawler\Crawler":private]=> array(0) { } ["storage":"SplObjectStorage":private]=> array(1) { ["0000000075faaa10000000001af55ef8"]=> array(2) { ["obj"]=> object(DOMElement)#241 (17) { ["tagName"]=> string(4) "html" ["schemaTypeInfo"]=> NULL ["nodeName"]=> string(4) "html" ["nodeValue"]=> string(438) "[{"id":1,"group_name":"Compte principal","group_desc":"Administrateur","group_level":9},{"id":2,"group_name":"Profil pour les comptables","group_desc":"Comptables","group_level":2},{"id":3,"group_name":"Validateur d'op\u00e9ration","group_desc":"Superviseur","group_level":9},{"id":18,"group_name":"No Comment","group_desc":"Autres employ\u00e9s","group_level":6},{"id":41,"group_name":"Invit\u00e9","group_desc":"Guest","group_level":2}]" ["nodeType"]=> int(1) ["parentNode"]=> string(22) "(object value omitted)" ["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> string(22) "(object value omitted)" ["lastChild"]=> string(22) "(object value omitted)" ["previousSibling"]=> string(22) "(object value omitted)" ["attributes"]=> string(22) "(object value omitted)" ["ownerDocument"]=> string(22) "(object value omitted)" ["namespaceURI"]=> NULL ["prefix"]=> string(0) "" ["localName"]=> string(4) "html" ["baseURI"]=> NULL ["textContent"]=> string(438) "[{"id":1,"group_name":"Compte principal","group_desc":"Administrateur","group_level":9},{"id":2,"group_name":"Profil pour les comptables","group_desc":"Comptables","group_level":2},{"id":3,"group_name":"Validateur d'op\u00e9ration","group_desc":"Superviseur","group_level":9},{"id":18,"group_name":"No Comment","group_desc":"Autres employ\u00e9s","group_level":6},{"id":41,"group_name":"Invit\u00e9","group_desc":"Guest","group_level":2}]" } ["inf"]=> NULL } } }

I stumbled at extracting/converting the internal representation of the JSON within $crawler object. How could that be done?


Solution

  • Delving into Class Symfony\Component\DomCrawler\Crawler documentation, I found

    public string html()
    
        Returns the first node of the list as HTML.
    
        Return Value
    
        string  The node html
    

    which works as I expected.

    Turning return dd($crawler) into return ($crawler->html()) yields:

    [{"id":1,"group_name":"Compte principal","group_desc":"Administrateur","group_level":9},{"id":2,"group_name":"Profil pour les comptables","group_desc":"Comptables","group_level":2},{"id":3,"group_name":"Validateur d'op\u00e9ration","group_desc":"Superviseur","group_level":9},{"id":18,"group_name":"No Comment","group_desc":"Autres employ\u00e9s","group_level":6},{"id":41,"group_name":"Invit\u00e9","group_desc":"Guest","group_level":2}]

    Conclusion

    Goutte managed very well the complex (Laravel | crsf mechanism) Login process but I dislike striping JSON string using html().

    Using return ($crawler->text()) getting at the same outcome is more "neutral" my opinion to.