import requests
r=requests.get('https://www.crummy.com/software/BeautifulSoup/')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
links=[x['href'] for x in soup.find_all('a')]
links
error is:
KeyError
Traceback (most recent call last)
<ipython-input-137-97ef77b6e69a> in <module>
----> 1 links=[x['href'] for x in soup.find_all('a')]
2 links
<ipython-input-137-97ef77b6e69a> in <listcomp>(.0)
----> 1 links=[x['href'] for x in soup.find_all('a')]
2 links
~\anaconda3\lib\site-packages\bs4\element.py in __getitem__(self, key)
1319 """tag[key] returns the value of the 'key' attribute for the Tag,
1320 and throws an exception if it's not there."""
-> 1321 return self.attrs[key]
1322
1323 def __iter__(self):
KeyError: 'href'
but, the following code works fine:
import requests
r=requests.get('https://en.wikipedia.org/wiki/Harvard_University')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
classes=[table['class'] for table in soup.find_all('table')]
classes
The first website contains the following element:
<a name="Download">
This anchor has no href
attribute (it's not a link, it's used as the target of the #Download
fragment), so you get an error.
You can use a selector to filter the tags to just those with the href
attribute.
links=[x['href'] for x in soup.select('a[href]')]