Search code examples
regexpython-3.xgoogle-finance

Can't parse Google Finance html


I'm trying to scrape some stock prices, and variations, from Google Finance using python3 but I just can't figure out if there's something wrong with the page, or my regex. I'm thinking that either the svg graphic or the many script tags throughout the page are making the regex parsers fail to properly analyze the code.

I have tested this regex on many online regex builders/testers and it looks ok. As ok as a regex designed for HTML can be, anyway.

The Google Finance page I'm testing this out on is https://www.google.com/finance?q=NYSE%3AAAPL And my python code is the following

import urllib.request
import re
page = urllib.request.urlopen('https://www.google.com/finance?q=NYSE%3AAAPL')
text = page.read().decode('utf-8')
m = re.search("id=\"price-panel.*>(\d*\d*\d\.\d\d)</span>.*\((-*\d\.\d\d%)\)", text, re.S)
print(m.groups())

It would extract the stock price and its percent variation. I have also tried using python2 + BeautifulSoup, like so

soup.find(id='price-panel')

but it returns empty even for this simple query. This is especially why I'm thinking that there's something weird with the html.

And here's the most important bit of html that I'm aiming for

<div id="price-panel" class="id-price-panel goog-inline-block">
<div>
<span class="pr">
<span class="unchanged" id="ref_22144_l"><span class="unchanged">96.41</span><span></span></span>
</span>
<div class="id-price-change nwp goog-inline-block">
<span class="ch bld"><span class="down" id="ref_22144_c">-1.13</span>
<span class="down" id="ref_22144_cp">(-1.16%)</span>
</span>
</div>
</div>
<div>
<span class="nwp">
Real-time:
&nbsp;
<span class="unchanged" id="ref_22144_ltt">3:42PM EDT</span>
</span>
<div class="mdata-dis">
<span class="dis-large"><nobr>NASDAQ
real-time data -
<a href="//www.google.com/help/stock_disclaimer.html#realtime" class="dis-large">Disclaimer</a>
</nobr></span>
<div>Currency in USD</div>
</div>
</div>
</div>

I'm wondering if any of you have encountered a similar problem with this page and/or can figure out if there's anything wrong with my code. Thanks in advance!


Solution

  • You might try a different URL that will be easier to parse, such as: http://www.google.com/finance/info?q=AAPL

    The catch is that Google has said that using this API in an application for public consumption is against their Terms of Service. Maybe there is an alternative that Google will allow you to use?