Search code examples

Python html parsing using beautifulsoup framework

I'm using Beauitful soup framework to retreive the link (href from the below html content)

         <div class="store">
                   <a title="Open in Google Play" href="" target="_blank">
                        <!-- ><span class="ui-icon app-store-gp"></span> -->
                        Google Play
                   </a><i class="icon-external-link"></i>

I used the following code to retrieve this in python:

 pageFile = urllib.urlopen("")
 pageHtml =
 print pageHtml
 soup = BeautifulSoup("".join(pageHtml))
 item = soup.find("a", {"title":"Open in Google Play"})

 print item

I get NoneType as the output. Any help would be really great.

I printed out the html page and the output was as follows:

  <head><title>503 Service Temporarily Unavailable</title></head>
  <body bgcolor="white">
  <center><h1>503 Service Temporarily Unavailable</h1></center>

It works fine on the browser


  • item = soup.find("a", {"title":"Open in Google Play"})

    You were initially searching for a "span" with a title "Open in Google Play", however the element that you're looking for is an "a" (a link).

    Edit: since it appears that the server returns a 503 error, try setting a common user-agent with this code (not tested, it may not work at all; you'll need to import urllib2) :

    soup = BeautifulSoup(urllib2.urlopen(urllib2.Request(sampleURL, None, {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"})).read())
    item = soup.find("a", {"title":"Open in Google Play"}) 
    print item

    Also I removed the useless "".join(pageHtml) since urllib2 already returns strings so there's no need for join.