I want to scrape some elements from the Duden webpage with this url: https://www.duden.de/rechtschreibung/aussuchen. When I look up the page manually no pop-up occures but when I use selenium on python this occures: image of pop up
I already tried a lot of things like blocking pop ups in general, or trying to click on the accept button. All of that is not working.
I tried to find an element of the frame and print a statement then to see whether it can find the elements but that is also not working.
Has anyone an idea why it is like that or what I could try more?
These are a few things I tried:
For blocking:
def getAllWordForms(word):
options = Options()
profile = webdriver.FirefoxProfile()
profile.set_preference("dom.disable_open_during_load", False)
driver = webdriver.Firefox(firefox_profile=profile,options=options, executable_path=os.path.join(driver_location, 'geckodriver'))
main_url = 'https://www.duden.de/rechtschreibung/'
word_url = main_url + '{}'.format(word)
driver.get(word_url)
to see if it can find an element in the pop up frame:
def getAllWordForms(word):
options = Options()
driver = webdriver.Firefox(options=options, executable_path=os.path.join(driver_location, 'geckodriver'))
main_url = 'https://www.duden.de/rechtschreibung/'
word_url = main_url + '{}'.format(word)
driver.get(word_url)
driver.implicitly_wait(10)
driver.switch_to.frame(1)
if driver.find_elements_by_class_name('message-button'):
print('yes')
to click the button:
def getAllWordForms(word):
options = Options()
options.headless = False
driver = webdriver.Firefox(options=options, executable_path=os.path.join(driver_location, 'geckodriver'))
main_url = 'https://www.duden.de/rechtschreibung/'
word_url = main_url + '{}'.format(word)
driver.get(word_url)
driver.implicitly_wait(10)
driver.switch_to.frame(1)
button = driver.find_element_by_xpath("//button[@aria-label='AKZEPTIEREN']")
button.click()
driver.switch_to.default_content()
I tried out various combinations, but it never works.
The elements of the page are structred like this: structure of page_1 structure of page_2
Hope I could explain it correct and that maybe someone could help me.
Every time you launch your webdriver you're using a new temporary profile. That profile has no cookies therefore it's seen by the site as a new user an needs to accept the cookie message.
I had a look at your site and to close the message you need to switch iframe. You were close with your solution, it might just be it needed a different method of selecting the frame...
This code works for me:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.duden.de/rechtschreibung/aussuchen")
iframe = driver.find_element_by_xpath("//iframe[contains(@id,'sp_message_iframe')]")
driver.switch_to.frame(iframe)
cookieAccpet = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//button[text()='AKZEPTIEREN']")))
cookieAccpet.click()
driver.switch_to.default_content()
Remember to switch back to the default frame at the end with driver.switch_to.default_content()
, then you can continue your script.