This is the html I am trying to scrape:
<span class="meta-attributes__attr-tags">
<a href="/tags/cinematic" title="cinematic">cinematic</a>,
<a href="/tags/dissolve" title="dissolve">dissolve</a>,
<a href="/tags/epic" title="epic">epic</a>,
<a href="/tags/fly" title="fly">fly</a>,
</span>
I want to get the anchor text for each a href: cinematic, dissolve, epic, etc.
This is the code I have:
url = urllib2.urlopen("http: example.com")
content = url.read()
soup = BeautifulSoup(content)
links = soup.find_all("span", {"class": "meta-attributes__attr-tags"})
for link in links:
print link.find_all('a')['href']
If I do it with "link.find_all" I get error: TypeError: List indices must be integers, not str.
But if I do print link.find('a')['href'] I get the first one only.
How can I get all of them ?
You could do the following:
from bs4 import BeautifulSoup
content = '''
<span class="meta-attributes__attr-tags">
<a href="/tags/cinematic" title="cinematic">cinematic</a>,
<a href="/tags/dissolve" title="dissolve">dissolve</a>,
<a href="/tags/epic" title="epic">epic</a>,
<a href="/tags/fly" title="fly">fly</a>,
</span>
'''
soup = BeautifulSoup(content)
spans = soup.find_all("span", {"class": "meta-attributes__attr-tags"})
for span in spans:
links = span.find_all('a')
for link in links:
print link['href']
Output
/tags/cinematic
/tags/dissolve
/tags/epic
/tags/fly