Search code examples
pythonseleniumphantomjscontent-type

How to get content-type from selenium page_source


I know the content-type can be gotten from

response = urllib2.urlopen(url)
content-type = response.info().getheader('Content-type')

Now, I need to execute js code so I choose selenium with Phantomjs to fetch web page.

driver = webdriver.PhantomJS()
driver.get(url)
source = driver.page_source

How can I get content-type from source without downloading web page twice? I know I can save the response.read() as html file, and then driver render the local html file without downloading it again. However, it's too slow. Any suggestions?


Solution

  • Selenium does not get the headers but you can just request the head with requests:

    import  requests
    
    print(requests.head(url).headers["Content-Type"])
    

    You can use httplib2, urliib2 etc.. there are numerous answers here showing how to request the head with various libs.