Search code examples
pythonselenium-webdriverbuttonselenium-chromedriverchrome-for-testing

Clicking the cookie button but the results are inconsistent


I am currently trying to scrape the following Page.

I have a problem clicking the cookie button as it runs inconsistently. In some runs the button gets clicked and the process continues as it should but in other cases it just stops and I get the following errors:

[24812:22288:0512/162346.912:ERROR:chrome_browser_cloud_management_controller.cc(161)] Cloud management controller initialization aborted as CBCM is not enabled. Please use the `--enable-chrome-browser-cloud-management` command line flag to enable it if you are not using the official Google Chrome build.

“[24812:22668:0512/162346.939:ERROR:sandbox_win.cc(895)] Sandbox cannot access executable. Check  filesystem permissions are valid. See https://bit. ly/31yqMJR.: Access is denied. (0x5)”

“DevTools listening on ws://127.0.0.1:65082/devtools/browser/d3c31f38-a536-4c0c-96d6-89734ca592e6[24812:22288:0512/162347.064:ERROR:network_service_instance_impl.cc(599)] Network service crashed, restarting service.”

This is the code I am using (it is longer but at the times it stops, it doesn’t run further than this point):

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import requests
import re

chrome_options = Options()

chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')

driver = webdriver.Chrome(options=chrome_options)

try:
    
    url = 'https://www.arbeitsagentur.de/jobsuche/suche?angebotsart=34&was=Programmierer%2Fin&sort=veroeffdatum&id=10000-1187619796-S'
    driver.get(url)

    cookie_button = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[aria-label="Auswahl bestätigen – Ausgewählte Cookies werden akzeptiert"]')))
    
    cookie_button.click()

Another issue is that when I try to run the code on headless mode the chances of it running as it should are much lower.

I am using “Google Chrome for Testing” Version 124.0.6367.78 with its appropriate chromedriver. Operating system Windows 10 Pro.

I have deleted and re-downloaded the browser and driver. I have checked if the browser has all the necessary permissions so it doesn’t get blocked by the firewall and it has.

Does anyone have any idea what might be the issue? Any help would be much appreciated.


Solution

  • Target element is within shadow-root. In such cases, you first need to locate the shadow host, then find your way through to the target element.

    Try this code, this should consistently click on the target element(Auswahl bestätigen) in headless or in headful mode.

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    driver = webdriver.Chrome(options=chrome_options)
    url = 'https://www.arbeitsagentur.de/jobsuche/suche?angebotsart=34&was=Programmierer%2Fin&sort=veroeffdatum&id=10000-1187619796-S'
    driver.get(url)
    driver.maximize_window()
    shadow_host = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, 'bahf-cookie-disclaimer-dpl3')))
    shadow_root = shadow_host.shadow_root
    cookie_button = WebDriverWait(shadow_root, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.ba-btn.ba-btn-contrast")))
    driver.execute_script("arguments[0].click();", cookie_button)
    

    few references for info: