Parsing IMDB with BeautifulSoup

I've stripped the following code from IMDB's mobile site using BeautifulSoup, with Python 2.7.

I want to create a separate object for the episode number '1', title 'Winter is Coming', and IMDB score '8.9'. Can't seem to figure out how to split apart the episode number and the title.

   <a class="btn-full" href="/title/tt1480055?ref_=m_ttep_ep_ep1">
     <span class="text-large">
      1.
      <strong>
       Winter Is Coming
      </strong>
     </span>
     <br/>
     <span class="mobile-sprite tiny-star">
     </span>
     <strong>
      8.9
     </strong>
     17 Apr. 2011
    </a>

Solution

You can use find to locate the span with the class text-large to the specific element you need.

Once you have your desired span, you can use next to grab the next line, containing the episode number and find to locate the strong containing the title

html = """
<a class="btn-full" href="/title/tt1480055?ref_=m_ttep_ep_ep1">
     <span class="text-large">
      1.
      <strong>
       Winter Is Coming
      </strong>
     </span>
     <br/>
     <span class="mobile-sprite tiny-star">
     </span>
     <strong>
      8.9
     </strong>
     17 Apr. 2011
    </a>
"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
span = soup.find('span', attrs={'text-large'})
ep = str(span.next).strip()
title = str(span.find('strong').text).strip()

print ep
print title

> 1. 
> Winter Is Coming