Search code examples
pythonweb-scrapingyahoo-finance

How to grab quarterly and specific the date of yahoo financial data with python?


I can download the annual data from this link by the following code, but it's not the same as what's shown on the website because it's the data of June:

enter image description here

Now I have two questions:

  1. How do I specific the date so the annual data is the same as the following picture(September instead of June as shown in red rectangle)?
  2. By clicking quarterly as shown in orange rectangle, the link won't be changed. How do I grab the quarterly data?

Thanks.

enter image description here


Solution

  • Just curious, but why write the html to file first and then read it with pandas? Pandas can take in the html request directly:

    import pandas as pd
    
    symbol = 'AAPL'
    url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
    
    dfs = pd.read_html(url)   
    print(dfs[0])
    

    Secondly, not sure why yours is popping up with the yearly dates. Doing the way as I have it above is showing September.

    print(dfs[0])
                                             0  ...                                  4
    0                                  Revenue  ...                          9/26/2015
    1                            Total Revenue  ...                          233715000
    2                          Cost of Revenue  ...                          140089000
    3                             Gross Profit  ...                           93626000
    4                       Operating Expenses  ...                 Operating Expenses
    5                     Research Development  ...                            8067000
    6       Selling General and Administrative  ...                           14329000
    7                            Non Recurring  ...                                  -
    8                                   Others  ...                                  -
    9                 Total Operating Expenses  ...                          162485000
    10                Operating Income or Loss  ...                           71230000
    11       Income from Continuing Operations  ...  Income from Continuing Operations
    12         Total Other Income/Expenses Net  ...                            1285000
    13      Earnings Before Interest and Taxes  ...                           71230000
    14                        Interest Expense  ...                            -733000
    15                       Income Before Tax  ...                           72515000
    16                      Income Tax Expense  ...                           19121000
    17                       Minority Interest  ...                                  -
    18          Net Income From Continuing Ops  ...                           53394000
    19                    Non-recurring Events  ...               Non-recurring Events
    20                 Discontinued Operations  ...                                  -
    21                     Extraordinary Items  ...                                  -
    22            Effect Of Accounting Changes  ...                                  -
    23                             Other Items  ...                                  -
    24                              Net Income  ...                         Net Income
    25                              Net Income  ...                           53394000
    26   Preferred Stock And Other Adjustments  ...                                  -
    27  Net Income Applicable To Common Shares  ...                           53394000
    
    [28 rows x 5 columns]
    

    For the second part, you could try to find the data 1 of a few ways:

    1) Check the XHR requests and get the data you want by including parameters to the request url that generates that data and can return to you in json format (which when I looked for, I could not find right off the bat, so moved on to the next option)

    2) Search through the <script> tags, as the json format can sometimes be within those tags (which I didn't search through very thoroughly, and think Selenium would just be a direct way since pandas can read in the tables)

    3) Use selenium to simulate opening the browser, getting the table, and clicking on "Quarterly", then getting that table

    I went with option 3:

    from selenium import webdriver
    import pandas as pd
    
    symbol = 'AAPL'
    url = 'https://finance.yahoo.com/quote/%s/financials?p=%s' %(symbol, symbol)
    
    driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
    driver.get(url)
    
    # Get Table shown in browser
    dfs_annual = pd.read_html(driver.page_source)   
    print(dfs_annual[0])
    
    # Click "Quarterly"
    driver.find_element_by_xpath("//span[text()='Quarterly']").click()
    
    # Get Table shown in browser
    dfs_quarter = pd.read_html(driver.page_source)   
    print(dfs_quarter[0])
    
    driver.close()