Trying to pull all the links from a site. I get the "KeyError: 'href' " but as I understand it, this only applies when there are tags without href present. However, when I look through the soup object, EVERY a tag has an href. So I don't get why I'm seeing this error. I've looked this up a lot and everyone always just mentions a tags without href.
from bs4 import BeautifulSoup
from datetime import datetime
import pandas as pd
import requests
page_count = 1
catalog_page = f"https://40kaudio.com/page/{str(page_count)}/?s"
while page_count < 4:
print(f"Begin Book Scrape from {catalog_page}")
# Soup opens the page.
open_page = requests.get(catalog_page)
# We create a soup object that has all the page stuff in it
soup = BeautifulSoup(open_page.content, "html.parser")
# We iterate through that soup object and pull out anything with a class of "title-post"
for link in soup.find_all('h2', "title-post"):
print(link['href'])
else:
print('By the Emperor!')
There is no href
tag in link
. However, there is an a
tag in link
and an href
attribute in the a
tag.
<a href="https://40kaudio.com/justin-d-hill-cadia-stands-audiobook/" rel="bookmark">Justin D. Hill – Cadia Stands Audiobook</a>
for link in soup.find_all('h2', "title-post"):
print(link.a['href'])
Don't forget to increment page_count
in your while
loop.