Background:
Following along with a Udemy tutorial which is parsing some information from Bing.
It takes in a user input and uses that as a parameter to search Bing with, returning all the href
links it can find on the first page
Code:
from bs4 import BeautifulSoup
import requests as re
search = input("Enter what you wanna search: \n")
params = {"q": search}
r = re.get("https://www.bing.com/search", params=params)
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find("ol",{"id":"b_results"})
links = results.findAll("li",{"class": "b_algo"})
for item in links:
item_text = item.find("a").text
item_href = item.href("a").attrs["href"]
if item_text and item_href:
print(item_text)
print(item_href)
else:
print("Couldn't find 'a' or 'href'")
Problem:
It returns nothing. The code obviously works for him. I get no errors as I've checked the id
and class
names to see if they've been changed on bing itself since the video was made but they are still the same
"ol",{"id":"b_results"}
"li",{"class": "b_algo"}
Any ideas? I'm a complete noob to web scraping but intermediate to Python.
Thanks in advance!
Your code needs a bit of reworking.
First of all, you need headers
otherwise Bing
(correctly) thinks you're a bot and it's not returning the HTML
.
Then, you need to check if the anchors are not None
and, say, have at least http
in the href
.
For example:
from bs4 import BeautifulSoup
import requests
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36",
}
page = requests.get("https://www.bing.com/search?", headers=headers, params={"q": "python"}).text
soup = BeautifulSoup(page, 'html.parser')
anchors = soup.find_all("a")
for anchor in anchors:
if anchor is not None:
try:
if "http" in anchor["href"]:
print(anchor.getText(), anchor["href"])
except KeyError:
continue
Output:
Welcome to Python.org https://www.python.org/
Diese Seite übersetzen http://www.microsofttranslator.com/bv.aspx?ref=SERP&br=ro&mkt=de-DE&dl=de&lp=EN_DE&a=https%3a%2f%2fwww.python.org%2f
Python Downloads https://www.python.org/downloads/
Windows https://www.python.org/downloads/windows/
Python for Beginners https://www.python.org/about/gettingstarted/
About https://www.python.org/about/
Documentation https://www.python.org/doc/
Community https://www.python.org/community/
Success Stories https://www.python.org/success-stories/
News https://www.python.org/blogs/
Python (Programmiersprache) – Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
CC-BY-SA-Lizenz http://creativecommons.org/licenses/by-sa/3.0/
Python lernen - Python Kurs für Anfänger und Fortgeschrittene https://www.python-lernen.de/
Python 3.9.0 (64bit) für Windows - Download https://python.de.uptodown.com/windows
Python-Tutorial: Tutorial für Anfänger und Fortgeschrittene https://www.python-kurs.eu/kurs.php
Mehr zu python-kurs.eu anzeigen https://www.python-kurs.eu/kurs.php
Python (Programmiersprache) – Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
Python (Programmiersprache) - Wikipedia https://de.wikipedia.org/wiki/Python_%28Programmiersprache%29
By the way, what course is this, because scraping search engines is not easy?