Search code examples
pythonweb-scrapingbeautifulsouphref

Beautiful Soup KeyError 'href' but the definitely exist


Trying to pull all the links from a site. I get the "KeyError: 'href' " but as I understand it, this only applies when there are tags without href present. However, when I look through the soup object, EVERY a tag has an href. So I don't get why I'm seeing this error. I've looked this up a lot and everyone always just mentions a tags without href.

from bs4 import BeautifulSoup
from datetime import datetime
import pandas as pd
import requests

page_count = 1
catalog_page = f"https://40kaudio.com/page/{str(page_count)}/?s"

while page_count < 4:
    print(f"Begin Book Scrape from {catalog_page}")
    # Soup opens the page.
    open_page = requests.get(catalog_page)
    # We create a soup object that has all the page stuff in it
    soup = BeautifulSoup(open_page.content, "html.parser")
    # We iterate through that soup object and pull out anything with a class of "title-post"
    for link in soup.find_all('h2', "title-post"):
        print(link['href'])

else:
    print('By the Emperor!')

Solution

  • There is no href tag in link. However, there is an a tag in link and an href attribute in the a tag.

    <a href="https://40kaudio.com/justin-d-hill-cadia-stands-audiobook/" rel="bookmark">Justin D. Hill &#8211; Cadia Stands Audiobook</a>
    
    for link in soup.find_all('h2', "title-post"):
        print(link.a['href'])
    

    Don't forget to increment page_count in your while loop.