I'm trying to write a python script that checks money.rediff.com for a particular stock price and prints it. I know that this can be done easily with their API, but I want to learn how urllib2 works, so I'm trying to do this the old fashioned way. But, I'm stuck on how to use the urllib. Many tutorials online asked me to the "Inspect element" of the value I need to return and split the string to get it. But, all the examples in the videos have the values with easily to split HTML Tags, but mine has it in something like this:
<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span>
<span id="change" class="green">+0.50</span>
<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>
I only need the "6.66" in Line2 out. How do I go about doing this? I'm very very new to Urllib2 and Python. All help will be greatly appreciated. Thanks in advance.
You can certainly do this with just urllib2
and perhaps a regular expression, but I'd encourage you to use better tools, namely requests
and Beautiful Soup
.
Here's a complete program to fetch a quote for "Tata Motors Ltd.":
from bs4 import BeautifulSoup
import requests
html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content
soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())
print(quote)
EDIT
Here's a Python 2 version just using urllib2
and re
:
import re
import urllib2
html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()
quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))
print quote