Search code examples
pythonweb-scrapingfinance

Yahoo Finance Download Data


I am trying to scrape finance.yahoo.com and download a data file. Specifically, this url: https://finance.yahoo.com/quote/AAPL/history?p=AAPL

I would like to complete two objectives here:

  1. I would like to set the data time period parameters to "Max", which I believe I would need to use Selenium and
  2. would like to download and save the data file that is embedded in the href that appears when inspect "Download Data".

So far, I am unable to access the drop-down required to click "Max" and also cannot locate the href required to download the file.

from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options

options = webdriver.ChromeOptions()
options.add_argument('--log-level=3')

stock = input()
base_url = 'https://finance.yahoo.com/quote/{}/history?p= 
{}'.format(stock,stock)
driver = webdriver.Chrome()
driver.get(base_url)
driver.maximize_window()
driver.implicitly_wait(4)
driver.find_element_by_class_name("Fl(end) Mt(3px) Cur(p)").click()
time.sleep(4)
driver.quit()

Solution

  • The following shows selectors you can use. I haven't added any wait conditions as the only one needed, in my test runs, I couldn't find; the wait for all new data to be present after pressing apply button. Instead, I use a hard coded time.sleep(5) which should be replaced with a better condition based wait if possible.

    from selenium import webdriver
    # from selenium.webdriver.common.by import By
    # from selenium.webdriver.support.ui import WebDriverWait
    # from selenium.webdriver.support import expected_conditions as EC
    import time
    
    d = webdriver.Chrome()
    d.get('https://finance.yahoo.com/quote/AAPL/history?p=AAPL')
    try:
        d.find_element_by_css_selector('[name=agree]').click() #oauth
    except:
        pass
    
    d.find_element_by_css_selector('[data-icon=CoreArrowDown]').click() #dropdown
    d.find_element_by_css_selector('[data-value=MAX]').click() #max
    d.find_element_by_css_selector('button.Fl\(start\)').click() # done
    d.find_element_by_css_selector('button.Fl\(end\) span').click() #apply
    time.sleep(5)
    d.find_element_by_css_selector('[download]').click() #download