Search code examples
pythonweb-scrapingbeautifulsoupscreen-scraping

How to extract only a specific kind of link from a webpage with beautifulsoup4


I'm trying to extract specific links on a page full of links. The links I need contain the word "apartment" in them.

But whatever I try, I get way more data extracted than only the links I need.

<a href="https://www.website.com/en/ad/apartment/abcd123" title target="IWEB_MAIN">

If anyone could help me out on this, it'd be much appreciated! Also, if you have a good source that could inform me better about this, it would be double appreciated!


Solution

  • Yon can use regular expression re.

    import re
    soup=BeautifulSoup(Pagesource,'html.parser')
    alltags=soup.find_all("a",attrs={"href" : re.compile("apartment")})
    for item in alltags:
        print(item['href']) #grab href value
    

    Or You can use css selector

    soup=BeautifulSoup(Pagesource,'html.parser')
    alltags=soup.select("a[href*='apartment']")
    for item in alltags:
        print(item['href'])
    

    You find the details in official documents Beautifulsoup

    Edited:

    You need to consider parent div first then find the anchor tag.

    import requests
    from bs4 import BeautifulSoup
    res=requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
    soup = BeautifulSoup(res.text, 'html.parser')
    for item in soup.select("div[data-type='resultgallery-resultitem'] >a[href*='apartment']"):
           print(item['href'])