Search code examples
phppythontwill

Python twill: download file accessible through PHP script


I use twill to navigate on a website protected by a login form.

from twill.commands import *

go('http://www.example.com/login/index.php') 
fv("login_form", "identifiant", "login")
fv("login_form", "password", "pass")
formaction("login_form", "http://www.example.com/login/control.php")
submit()
go('http://www.example.com/accueil/index.php')

On this last page I want to download an Excel file which is accessible through a div with the following attribute:

onclick="OpenWindowFull('../util/exports/control.php?action=export','export',200,100);"

With twill I am able to access the URL of the PHP script and show the content of the file.

go('http://www.example.com/util/exports/control.php?action=export')
show()

However a string is returned corresponding to the raw content: thus unusable. Is there a way to retrieve directly the Excel file in a way similar to urllib.urlretrieve()?


Solution

  • I managed to do it sending the cookie jar from twill to requests.

    Nota: I could not use requests only due to an intricate control at login (was not able to figure out the correct headers or other options).

    import requests
    from twill.commands import *
    
    # showing login form with twill
    go('http://www.example.com/login/index.php') 
    showforms()
    
    # posting login form with twill
    fv("login_form", "identifiant", "login")
    fv("login_form", "password", "pass")
    formaction("login_form", "http://www.example.com/login/control.php")
    submit()
    
    # getting binary content with requests using twill cookie jar
    cookies = requests.utils.dict_from_cookiejar(get_browser()._session.cookies)
    url = 'http://www.example.com/util/exports/control.php?action=export'
    
    with open('out.xls', 'wb') as handle:
        response = requests.get(url, stream=True, cookies=cookies)
    
        if not response.ok:
            raise Exception('Could not get file from ' + url)
    
        for block in response.iter_content(1024):
            handle.write(block)