Search code examples
pythonpython-3.xselenium-webdriverweb-scrapingyahoo-finance

How should I properly use Selenium


I'm trying to get one number from Yahoo Finance (http://finance.yahoo.com/quote/AAPL/financials?p=AAPL), Balance Sheet, Total Stockholder Equity. If I inspect the element I get this:

<span data-reactid=".1doxyl2xoso.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2:1:$BALANCE_SHEET.0.0.$TOTAL_STOCKHOLDER_EQUITY.1:$0.0.0">119,355,000</span>

I would like to get, scrap the number: 119,355,000.

If I understand correctly, web page is coded in Java Script and I need to use Selenium to get to the desired number. My attempt (I'm complete beginner) is not working no matter what I do, Bellow are three of many attempts. I tried to use 'data-reactid' and few other tings and I'm running out of ideas :-)

elem = Browser.find_element_by_partial_link_text('TOTAL_STOCKHOLDER_EQUITY')
elem = browser.find_element_by_id('TOTAL_STOCKHOLDER_EQUITY') 
elem = browser.find_elem_by_id('TOTAL_STOCKHOLDER_EQUITY')

Solution

  • Actually your all locator looks like invalid, try using find_element_by_css_selector as below :-

    elem = browser.find_element_by_css_selector("span[data-reactid *= 'TOTAL_STOCKHOLDER_EQUITY']")
    

    Note: find_element_by_partial_text is use to locate only a with paritially match of text content not their attribute text and find_element_by_id is use to locate any element with their id attribute which will match exactly with passing value.

    Edited :- There are more elements found with the provided locator, so you should try to find exact row of Total Stockholder Equity means tr element then find all their td elements as below :-

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    browser = webdriver.Chrome()
    browser.get('http://finance.yahoo.com/quote/AAPL/financials?p=AAPL')
    browser.maximize_window()
    
    wait = WebDriverWait(browser, 5) 
    
        try:
            #first try to find balance sheet link and click on it
            balanceSheet = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text() = 'Balance Sheet']")))
            balanceSheet.click() 
    
            #Now find the row element of Total Stockholder Equity
            totalStockRow = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "tr[data-reactid *= 'TOTAL_STOCKHOLDER_EQUITY']")))
    
            #Now find all the columns included with Total Stockholder Equity
            totalColumns = totalStockRow.find_elements_by_tag_name("td")
    
            #Now if you want to print single value just pass the index into totalColumns other wise print all values in the loop
    
            #Now print all values in the loop
            for elem in totalColumns:
                 print elem.text
                 #it will print value as 
                 #Total Stockholder Equity
                 #119,355,000
                 #111,547,000
                 #123,549,000
        except:
            print('Was not able to find the element with that name.')
    

    Hope it helps...:)