Search code examples
selenium-webdriverweb-scrapingxpathdynamicfind

Selenium returns looked up element unreliably


I got stuck.. so many hours.. looked up hundreds of questions & answers here..

I want to grep data from a banks product website, e.g. "Delta" from:

https://wertpapiere.ing.de/Investieren/Derivat/DE000HS2JL06

(link will be dead on 17.09.2024 as product is going to end then)

delta.text should be -0,0193

First attempt:

delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/following-sibling::td')

works sometimes.. mostly not.. for what reason? It could be the case as "Delta" appears 10 times on the site but then:

delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/[5]following-sibling::td') 

should solve the issue but it doesn't.

Another try:

delta = driver.find_element(By.XPATH, '//td[contains(text(), "Delta")]/following-sibling::td') 

should work but doesn't either.

The attempt with the full path should solve the issue:

delta = driver.find_element(By.XPATH, '/html/body/main/div[2]/div/div[2]/div[1]/sh-derivative-greeks/div/div[1]/div/table/tbody/tr[2]/td[2]')

but the element can't be found; I assume because of the dynamic IDs the site is generating.

Does anyone have the decisive tip?

Thanks so much! Chris


Solution

  • Information in that page is being fed via XHR calls to various API endpoints. You can inspect those endpoints in browser's Dev Tools -> Network tab. Here is how you can get that particular delta value:

    import requests
    import pandas as pd
    
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)
    
    headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
    }
    
    df = pd.json_normalize(requests.get('https://component-api.wertpapiere.ing.de/api/v1/derivative/greeks/DE000HS2JL06', headers=headers).json())
    print(df[[x for x in df.columns if x not in ['description.value']]])
    

    Result in terminal:

    isin    isEmpty isVisible   description.label   delta.value delta.formatString  delta.formatType    delta.label gamma.value gamma.formatString  gamma.formatType    gamma.label theta.value theta.formatString  theta.formatType    theta.label vega.value  vega.formatString   vega.formatType vega.label  rho.value   rho.formatString    rho.formatType  rho.label   omega.value omega.formatString  omega.formatType    omega.label labels.title    labels.noInformationAvailable
    0   DE000HS2JL06    False   True    Erklärung der Griechen  -0.0193 0,0.00[00]  Number  Delta   0.0003  0,0.00[00]  Number  Gamma   -0.003  0,0.00[00]  Number  Theta   0.0139  0,0.00[00]  Number  Vega    -0.0382 0,0.00[00]  Number  Rho -8.8416 0,0.00  Number  Omega (Hebel)   Griechen    Keine Informationen vorhanden
    

    Requests documentation can be found here.

    EDIT: If you prefer Selenium:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = Options()
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument('disable-notifications')
    chrome_options.add_argument("window-size=1280,1080")
    
    with webdriver.Chrome(options=chrome_options) as driver:
        wait = WebDriverWait(driver, 15)
        driver.get('https://wertpapiere.ing.de/Investieren/Derivat/DE000HS2JL06')
        delta_value = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="sh-derivative-greeks"]//tr[@name="delta"]//td[@class="value"]'))).get_attribute('innerHTML')
        print(delta_value)
    

    Result in terminal:

    -0,0193
    

    Selenium documentation can be found here.