Search code examples
pythonscreen-scraping

Python Yahoo Stock Exchange (Web Scraping)


I'm having trouble with the following code, it's suppose to print the stock prices by accessing yahoo finance but I can't figure out why its returning empty strings?

import urllib
import re

symbolslist = ["aapl","spy", "goog","nflx"]
i = 0
while i < len(symbolslist):
    url = "http://finance.yahoo.com/q?s="+symbolslist[i]+"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()

    regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print price
    i+=1

Edit: It works fine now, it was a syntax error. Edited the code above as well.


Solution

  • These are just a few helpful tips for python development (and scraping):

    Python Requests library.

    The python requests library is excellent at simplifying the requests process.

    No need to use a while loop

    for loops are really useful in this situation.

    symbolslist = ["aapl","spy", "goog","nflx"]
    for symbol in symbolslist:
        # Do logic here...
    

    Use xpath over regular expressions

    import requests
    import lxml
    
    url = "http://www.google.co.uk/finance?q="+symbol+"&q1=1"
    r = requests.get(url)
    xpath = '//your/xpath'
    root = lxml.html.fromstring(r.content)
    

    No need to compile your regular expressions each time.

    Compiling regex's takes time and effort. You can abstract these out of your loop.

    regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    
    for symbol in symbolslist:
        # do logic
    

    External Libraries

    As mentioned in the comment by drewk both Pandas and Matplot have native functions to get Yahoo quotes or you can use the ystockquote library to scrape from Yahoo. This is used like so:

    #!/bin/env python
    import ystockquote
    
    symbolslist = ["aapl","spy", "goog","nflx"]
    for symbol in symbolslist:
        print (ystockquote.get_price(symbol))