Search code examples
pythonselenium-webdrivertooltipscreen-scrapingmousehover

How do I scrape the text that appears when I mouse-hover the element?


On website https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904 I tried to scrape all the tool-tip information, the price and date of CPU in "Pricing history" section

from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service()
driver = webdriver.Chrome(options=options, service=webdriver_service)

driver.get('https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904')
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[@id='placeholder']/div/canvas[2]")))

for el in element:       
    ActionChains(driver).move_to_element(el).perform()   
    mouseover = WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.SELECTOR, ".placeholder > div > div.canvasjs-chart-tooltip > div > span")))      
    print(mouseover.text)

But the outcome says: 'WebElement' object is not iterable. Is there anything I have to modify? Or is there any other good way to scrape all the mouse-hover information of price and date in the 'Pricing History' section?


Solution

  • To get the time/prices from the graph into a pandas dataframe you can use next example:

    import re
    
    import pandas as pd
    import requests
    
    url = (
        "https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i9-11900K+%40+3.50GHz&id=3904"
    )
    
    html_text = requests.get(url).text
    
    df = pd.DataFrame(
        re.findall(r"dataArray\.push\({x: (\d+), y: ([\d.]+)}", html_text),
        columns=["time", "price"],
    )
    
    df["time"] = pd.to_datetime(df["time"].astype(int) // 1000, unit="s")
    print(df.tail())
    

    Prints:

                       time   price
    236 2023-05-28 06:00:00  317.86
    237 2023-05-29 06:00:00  319.43
    238 2023-05-30 06:00:00  429.99
    239 2023-05-31 06:00:00  314.64
    240 2023-06-01 06:00:00   318.9