Search code examples
pythonweb-scrapingbeautifulsoupfind

Beautiful Soup not returning anything I expected


Background: Following along with a Udemy tutorial which is parsing some information from Bing. It takes in a user input and uses that as a parameter to search Bing with, returning all the href links it can find on the first page

Code:

from bs4 import BeautifulSoup
import requests as re

search = input("Enter what you wanna search: \n")
params = {"q": search}
r = re.get("https://www.bing.com/search", params=params)

soup = BeautifulSoup(r.text, 'html.parser')

results = soup.find("ol",{"id":"b_results"})
links = results.findAll("li",{"class": "b_algo"})


for item in links:
    item_text = item.find("a").text
    item_href = item.href("a").attrs["href"]

    if item_text and item_href:
        print(item_text)
        print(item_href)

    else:
        print("Couldn't find 'a' or 'href'")

Problem: It returns nothing. The code obviously works for him. I get no errors as I've checked the id and class names to see if they've been changed on bing itself since the video was made but they are still the same

"ol",{"id":"b_results"}
"li",{"class": "b_algo"}

Any ideas? I'm a complete noob to web scraping but intermediate to Python.

Thanks in advance!


Solution

  • Your code needs a bit of reworking.

    First of all, you need headers otherwise Bing (correctly) thinks you're a bot and it's not returning the HTML.

    Then, you need to check if the anchors are not None and, say, have at least http in the href.

    For example:

    from bs4 import BeautifulSoup
    import requests
    
    
    headers = {
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36",
    }
    page = requests.get("https://www.bing.com/search?", headers=headers, params={"q": "python"}).text
    soup = BeautifulSoup(page, 'html.parser')
    
    anchors = soup.find_all("a")
    for anchor in anchors:
        if anchor is not None:
            try:
                if "http" in anchor["href"]:
                    print(anchor.getText(), anchor["href"])
            except KeyError:
                continue
    

    Output:

    Welcome to Python.org https://www.python.org/
    Diese Seite übersetzen http://www.microsofttranslator.com/bv.aspx?ref=SERP&br=ro&mkt=de-DE&dl=de&lp=EN_DE&a=https%3a%2f%2fwww.python.org%2f
    Python Downloads https://www.python.org/downloads/
    Windows https://www.python.org/downloads/windows/
    Python for Beginners https://www.python.org/about/gettingstarted/
    About https://www.python.org/about/
    Documentation https://www.python.org/doc/
    Community https://www.python.org/community/
    Success Stories https://www.python.org/success-stories/
    News https://www.python.org/blogs/
    Python (Programmiersprache) – Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
    Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
    CC-BY-SA-Lizenz http://creativecommons.org/licenses/by-sa/3.0/
    Python lernen - Python Kurs für Anfänger und Fortgeschrittene https://www.python-lernen.de/
    Python 3.9.0 (64bit) für Windows - Download https://python.de.uptodown.com/windows
    Python-Tutorial: Tutorial für Anfänger und Fortgeschrittene https://www.python-kurs.eu/kurs.php
    Mehr zu python-kurs.eu anzeigen https://www.python-kurs.eu/kurs.php
    Python (Programmiersprache) – Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
    Python (Programmiersprache) - Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
    

    By the way, what course is this, because scraping search engines is not easy?