Search code examples
pythonseleniumpubchem

Getting a number from pubchem site with Selenium


I'm doing a search on the pubchem site with the code below. I need to get the "Compound CID:" number from the screen from the search result but I couldn't get it. I need help on this.

driver = webdriver.Chrome()
url = "https://pubchem.ncbi.nlm.nih.gov/"
driver.get(url)
driver.maximize_window()
searchInput = driver.find_element_by_xpath("/html/body/div[1]/div/div/main/div[1]/div/div[2]/div/div[2]/form/div/div[1]/input")
searchInput.click()
searchInput.send_keys("75-05-8")
searchInput.send_keys(Keys.ENTER)
time.sleep(2)
driver.close()

Solution

  • To print the text 6342 you can use either of the following Locator Strategies:

    • Using css_selector and get_attribute("innerHTML"):

      print(driver.find_element(By.CSS_SELECTOR, "a[data-label^='Featured Compound Result Secondary Link; Position:1; Page:1'] > span.breakword > span").get_attribute("innerHTML"))
      
    • Using xpath and text attribute:

      print(driver.find_element(By.XPATH, "//a[starts-with(@data-label, 'Featured Compound Result Secondary Link; Position:1; Page:1')]/span[@class='breakword']/span").text)
      

    Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

    • Using CSS_SELECTOR and text attribute:

      driver.get("https://pubchem.ncbi.nlm.nih.gov/")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='text'][id^='search']"))).send_keys("75-05-8" + Keys.RETURN)
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[data-label^='Featured Compound Result Secondary Link; Position:1; Page:1'] > span.breakword > span"))).text)
      
    • Using XPATH and get_attribute("innerHTML"):

      driver.get("https://pubchem.ncbi.nlm.nih.gov/")
      WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@type='text'][starts-with(@id, 'search')]"))).send_keys("75-05-8" + Keys.RETURN)
      print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@data-label, 'Featured Compound Result Secondary Link; Position:1; Page:1')]/span[@class='breakword']/span"))).text)
      
    • Note : You have to add the following imports :

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      
    • Console Output:

      6342
      

    You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


    References

    Link to useful documentation: