Search code examples
pythonseleniumgeckodriver

Get unaltered html via selenium


I'm using python/selenium/headless geckodriver to scrape a page, but how can I get the unaltered html as it was downloaded before JS started manipulating the elements? This is what I've tried:

fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)

Solution

  • This seems to be the way to do it:

    profile = webdriver.FirefoxProfile()
    profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
    profile.set_preference("app.update.auto", False)
    profile.set_preference("app.update.enabled", False)
    profile.update_preferences()
    options = webdriver.FirefoxOptions()
    options.profile = profile
    options.headless = True
    driver = webdriver.Firefox(options=options)
    url = 'https://www.somewhere.com/some/path'
    driver.get(url)