Search code examples
web-scrapinggoutte

Goutte / Web Scraping - How to intercept and download a file


Firstly, thanks in advance for your help here, it's really appreciated!

I've successfully managed to get Goutte to authenticate, hit a URL, change a select field and click a submit button.

The page then reloads and as it finishes loading, it downloads a file to the client.

How do I intercept this file within Goutte? I've read as much doco as I can but can't seem to find an answer. I then want to basically hit this file, traverse it and save it locally.

Depending upon the file type, I want to traverse it, or save it locally.

Thanks :-)


Solution

  • It is not easy to achieve this. In my situation, I open the URL where the file is (after authentication) then the server gives the file (as an object of Page), afterwards you can get the content of the page.

    // $url contains the path to the file.
    $session->visit($url);
    $page = $session->getPage();
    $saved = file_put_contents($targetFilePath, $page->getContent());
    

    In my case, I am downloading zip file. In your case, probably save it in a temporary location, detect the type then move it to any desired directory. Hope this helps.