Search code examples
pythonseleniumselenium-webdriverxpathwebdriverwait

Get value from a website using selenium in python


I am doing my first steps with Selenium in Python and want to extract a certain value from a webpage. The value i need to find on the webpage is the ID (Melde-ID), which is 355460. In the html i found the 2 lines containing my info:

<h3 _ngcontent-wwf-c32="" class="title"> Melde-ID: 355460 </h3><span _ngcontent-wwf-c32="">
<div _ngcontent-wwf-c27="" class="label"> Melde-ID </div><div _ngcontent-wwf-c27="" class="value">

I have been searching websites for about 2 hours for what command to use but i don't know what to actually search for in the html. The website is a html with .js modules. It works to open the URL over selenium.

(At first i tried using beautifulsoup but was not able to open the page for some restriction. I did verify that the robots.txt does not disallow anything, but the error on beautifulsoup was "Unfortunately, a problem occurred while forwarding your request to the backend server".)

I would be thankful for any advice and hope i did explain my issue. The code i tried to create in Jupyter Notebook with Selenium installed is as follows:

from selenium import webdriver
import codecs
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://...."
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)
#print(driver.page_source)
#Try 2
#print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Melde-ID']")))])
#close browser
driver.quit()

Solution

  • From the information you shared here we can see that the element containing the desired information doesn't have class name attribute with a value of Melde-ID.
    It has class name with value of title and contains text Melde-ID.
    Also, you should use webdriver wait expected condition instead of driver.implicitly_wait(0.5).
    With these changes your code can be something like this:

    from selenium import webdriver
    import codecs
    import os
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options
    
    url = "https://...."
    driver = webdriver.Chrome('./chromedriver')
    
    wait = WebDriverWait(driver, 20)
    
    #maximize browser
    driver.maximize_window()
    #launch URL
    driver.get(url)
    
    content = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(@class,'title') and contains(.,'Melde-ID:')]"))).text
    

    I added .text to extract the text from that web element.
    Now content should contain Melde-ID: 355460 value.