Search code examples
pythonweb-scrapingbeautifulsoupdata-sciencedata-analysis

unable to find the value of attribute 'href' in a tag 'a' but when i tried this with attribute 'class' in tag 'table' it worked


import requests
r=requests.get('https://www.crummy.com/software/BeautifulSoup/')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
links=[x['href'] for x in soup.find_all('a')]
links 

error is:

KeyError                                  
Traceback (most recent call last)
<ipython-input-137-97ef77b6e69a> in <module>
----> 1 links=[x['href'] for x in soup.find_all('a')]
      2 links

<ipython-input-137-97ef77b6e69a> in <listcomp>(.0)
----> 1 links=[x['href'] for x in soup.find_all('a')]
      2 links

~\anaconda3\lib\site-packages\bs4\element.py in __getitem__(self, key)
   1319         """tag[key] returns the value of the 'key' attribute for the Tag,
   1320         and throws an exception if it's not there."""
-> 1321         return self.attrs[key]
   1322 
   1323     def __iter__(self):

KeyError: 'href'

but, the following code works fine:

import requests
r=requests.get('https://en.wikipedia.org/wiki/Harvard_University')
from bs4 import BeautifulSoup as bs
soup=bs(r.text,'html.parser')
classes=[table['class'] for table in soup.find_all('table')]
classes 

Solution

  • The first website contains the following element:

    <a name="Download">
    

    This anchor has no href attribute (it's not a link, it's used as the target of the #Download fragment), so you get an error.

    You can use a selector to filter the tags to just those with the href attribute.

    links=[x['href'] for x in soup.select('a[href]')]