I got stuck.. so many hours.. looked up hundreds of questions & answers here..
I want to grep data from a banks product website, e.g. "Delta" from:
https://wertpapiere.ing.de/Investieren/Derivat/DE000HS2JL06
(link will be dead on 17.09.2024 as product is going to end then)
delta.text should be -0,0193
First attempt:
delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/following-sibling::td')
works sometimes.. mostly not.. for what reason? It could be the case as "Delta" appears 10 times on the site but then:
delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/[5]following-sibling::td')
should solve the issue but it doesn't.
Another try:
delta = driver.find_element(By.XPATH, '//td[contains(text(), "Delta")]/following-sibling::td')
should work but doesn't either.
The attempt with the full path should solve the issue:
delta = driver.find_element(By.XPATH, '/html/body/main/div[2]/div/div[2]/div[1]/sh-derivative-greeks/div/div[1]/div/table/tbody/tr[2]/td[2]')
but the element can't be found; I assume because of the dynamic IDs the site is generating.
Does anyone have the decisive tip?
Thanks so much! Chris
Information in that page is being fed via XHR calls to various API endpoints. You can inspect those endpoints in browser's Dev Tools -> Network tab.
Here is how you can get that particular delta
value:
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
df = pd.json_normalize(requests.get('https://component-api.wertpapiere.ing.de/api/v1/derivative/greeks/DE000HS2JL06', headers=headers).json())
print(df[[x for x in df.columns if x not in ['description.value']]])
Result in terminal:
isin isEmpty isVisible description.label delta.value delta.formatString delta.formatType delta.label gamma.value gamma.formatString gamma.formatType gamma.label theta.value theta.formatString theta.formatType theta.label vega.value vega.formatString vega.formatType vega.label rho.value rho.formatString rho.formatType rho.label omega.value omega.formatString omega.formatType omega.label labels.title labels.noInformationAvailable
0 DE000HS2JL06 False True Erklärung der Griechen -0.0193 0,0.00[00] Number Delta 0.0003 0,0.00[00] Number Gamma -0.003 0,0.00[00] Number Theta 0.0139 0,0.00[00] Number Vega -0.0382 0,0.00[00] Number Rho -8.8416 0,0.00 Number Omega (Hebel) Griechen Keine Informationen vorhanden
Requests documentation can be found here.
EDIT: If you prefer Selenium:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,1080")
with webdriver.Chrome(options=chrome_options) as driver:
wait = WebDriverWait(driver, 15)
driver.get('https://wertpapiere.ing.de/Investieren/Derivat/DE000HS2JL06')
delta_value = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="sh-derivative-greeks"]//tr[@name="delta"]//td[@class="value"]'))).get_attribute('innerHTML')
print(delta_value)
Result in terminal:
-0,0193
Selenium documentation can be found here.