I'm trying to only get the links that contain the text /Archive.aspx?ADID=
. However, I always get all the links on the webpage instead. After I get the links I want, how would I navigate to each of those pages?
from bs4 import BeautifulSoup, SoupStrainer
import requests
url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
key = '/Archive.aspx?ADID='
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
if 'Archive.aspx?ADID=' in page.text:
print(link.get('href'))
Try:
import requests
from bs4 import BeautifulSoup
url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
key = "Archive.aspx?ADID="
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for link in soup.find_all("a"):
if key in link.get("href", ""):
print("https://www.ci.atherton.ca.us/" + link.get("href"))
Prints:
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3581
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3570
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3564
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3559
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3556
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3554
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3552
...and so on.