When i scrape page that contains products with usage of headless option i get different results.
For the same question one time i get results that are not sorted, and another time with proper sorted order.
Selenium firefox browser:
firefox_options = Options()
firefox_options.headless = True
browser = webdriver.Firefox(options=firefox_options, executable_path=firefox_driver)
According to this post:
"firefox does not send different headers when using the headless option".
How to use headless option to get constant results from scraping?
Update:
Its turns out that ads popup window was hiding price sort menu. With setting constant windows size as posted by DebanjanB, problem was solved.
Thanks for any suggestions
Ideally, using and not using firefox_options.headless = True
shouldn't have any major effect on the elements within the DOM Tree getting rendered but may have a significant difference as far as the Viewport is concerned.
As an example, when GeckoDriver/Firefox is initialized along with the --headless
option the default Viewport is width = 1366px, height = 768px
where as when GeckoDriver/Firefox is initialized without the --headless
option the default Viewport is width = 1382px, height = 744px
.
Example Code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.FirefoxOptions()
options.headless = True
driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
print ("Headless Firefox Initialized")
size = driver.get_window_size()
print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
driver.quit()
driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, "q")))
print ("Firefox Initialized")
size = driver.get_window_size()
print("Window size: width = {}px, height = {}px".format(size["width"], size["height"]))
driver.quit()
Console Output:
Headless Firefox Initialized
Window size: width = 1366px, height = 768px
Firefox Initialized
Window size: width = 1382px, height = 744px
From the above observation it can be inferred that with --headless
option GeckoDriver/Firefox opens the Browsing Context with reduced Viewport and hence the number of elements identified can be less.
While using GeckoDriver/Firefox to initiate a Browsing Context always open in maximized
mode or configure through set_window_size()
as follows:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.FirefoxOptions()
options.headless = True
#options.add_argument("start-maximized")
options.add_argument("window-size=1400,600")
driver = webdriver.Firefox(options=options, executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.google.com/")
driver.set_window_size(1920, 1080)
You find a couple of relevant discussion on window size in: