I've stripped the following code from IMDB's mobile site using BeautifulSoup, with Python 2.7.
I want to create a separate object for the episode number '1', title 'Winter is Coming', and IMDB score '8.9'. Can't seem to figure out how to split apart the episode number and the title.
<a class="btn-full" href="/title/tt1480055?ref_=m_ttep_ep_ep1">
<span class="text-large">
1.
<strong>
Winter Is Coming
</strong>
</span>
<br/>
<span class="mobile-sprite tiny-star">
</span>
<strong>
8.9
</strong>
17 Apr. 2011
</a>
You can use find
to locate the span
with the class text-large
to the specific element you need.
Once you have your desired span
, you can use next
to grab the next line, containing the episode number and find
to locate the strong
containing the title
html = """
<a class="btn-full" href="/title/tt1480055?ref_=m_ttep_ep_ep1">
<span class="text-large">
1.
<strong>
Winter Is Coming
</strong>
</span>
<br/>
<span class="mobile-sprite tiny-star">
</span>
<strong>
8.9
</strong>
17 Apr. 2011
</a>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
span = soup.find('span', attrs={'text-large'})
ep = str(span.next).strip()
title = str(span.find('strong').text).strip()
print ep
print title
> 1.
> Winter Is Coming