I'm using python/selenium/headless geckodriver to scrape a page, but how can I get the unaltered html as it was downloaded before JS started manipulating the elements? This is what I've tried:
fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.headless = True
driver = webdriver.Firefox(options=fireFoxOptions)
driver.get(url)
print(driver.page_source)
This seems to be the way to do it:
profile = webdriver.FirefoxProfile()
profile.DEFAULT_PREFERENCES['frozen']['javascript.enabled'] = False
profile.set_preference("app.update.auto", False)
profile.set_preference("app.update.enabled", False)
profile.update_preferences()
options = webdriver.FirefoxOptions()
options.profile = profile
options.headless = True
driver = webdriver.Firefox(options=options)
url = 'https://www.somewhere.com/some/path'
driver.get(url)