Search code examples
pythonhtmlparsingweb-scrapingbeautifulsoup

Parsing for Specific Text in HTML href


I'm trying to only get the links that contain the text /Archive.aspx?ADID=. However, I always get all the links on the webpage instead. After I get the links I want, how would I navigate to each of those pages?

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
key = '/Archive.aspx?ADID='

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    if 'Archive.aspx?ADID=' in page.text: 
        print(link.get('href'))

Solution

  • Try:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
    key = "Archive.aspx?ADID="
    
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    for link in soup.find_all("a"):
        if key in link.get("href", ""):
            print("https://www.ci.atherton.ca.us/" + link.get("href"))
    

    Prints:

    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3581
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3570
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3564
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3559
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3556
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3554
    https://www.ci.atherton.ca.us/Archive.aspx?ADID=3552
    
    ...and so on.