I'm trying to scrape Google Search Result but all I'm getting as an output is empty list. Do you have any idea what's wrong here? I found the similar post on Stack Overflow where solution says you should try putting user_agent. I tried but it still returns nothing. Please share if you have any idea.
import requests, webbrowser
from bs4 import BeautifulSoup
user_input = input("Enter something to search:")
print("googling.....")
google_search = requests.get("https://www.google.com/search?q="+user_input)
# print(google_search.text)
soup = BeautifulSoup(google_search.text , 'html.parser')
# print(soup.prettify())
search_results = soup.select('.r a')
# print(search_results)
for link in search_results[:5]:
actual_link = link.get('href')
print(actual_link)
webbrowser.open('https://google.com/'+actual_link)
To get results from Google page, you have to specify User-Agent
http header. For english results, add hl=en
parameter to search URL:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
user_input = input("Enter something to search: ")
print("googling.....")
google_search = requests.get("https://www.google.com/search?hl=en&q="+user_input, headers=headers) # <-- add headers and hl=en parameter
soup = BeautifulSoup(google_search.text , 'html.parser')
search_results = soup.select('.r a')
for link in search_results:
actual_link = link.get('href')
print(actual_link)
Prints:
Enter something to search: tree
googling.....
https://en.wikipedia.org/wiki/Tree
#
https://webcache.googleusercontent.com/search?q=cache:wHCoEH9G9w8J:https://en.wikipedia.org/wiki/Tree+&cd=22&hl=en&ct=clnk&gl=sk
/search?hl=en&q=related:https://en.wikipedia.org/wiki/Tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAVegQIAxAH
https://simple.wikipedia.org/wiki/Tree
#
https://webcache.googleusercontent.com/search?q=cache:tNzOpY417g8J:https://simple.wikipedia.org/wiki/Tree+&cd=23&hl=en&ct=clnk&gl=sk
/search?hl=en&q=related:https://simple.wikipedia.org/wiki/Tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAWegQIARAH
https://www.britannica.com/plant/tree
#
https://webcache.googleusercontent.com/search?q=cache:91hg5d2649QJ:https://www.britannica.com/plant/tree+&cd=24&hl=en&ct=clnk&gl=sk
/search?hl=en&q=related:https://www.britannica.com/plant/tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAXegQIAhAJ
https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree
#
https://webcache.googleusercontent.com/search?q=cache:AVSszZLtPiQJ:https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree+&cd=25&hl=en&ct=clnk&gl=sk
https://teamtrees.org/
#
https://webcache.googleusercontent.com/search?q=cache:gVbpYoK7meUJ:https://teamtrees.org/+&cd=26&hl=en&ct=clnk&gl=sk
https://www.ldoceonline.com/dictionary/tree
#
https://webcache.googleusercontent.com/search?q=cache:oyS4e3WdMX8J:https://www.ldoceonline.com/dictionary/tree+&cd=27&hl=en&ct=clnk&gl=sk
https://en.wiktionary.org/wiki/tree
#
https://webcache.googleusercontent.com/search?q=cache:s_tZIjpvHZIJ:https://en.wiktionary.org/wiki/tree+&cd=28&hl=en&ct=clnk&gl=sk
/search?hl=en&q=related:https://en.wiktionary.org/wiki/tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAbegQICBAH
https://www.dictionary.com/browse/tree
#
https://webcache.googleusercontent.com/search?q=cache:EhFIP6m4MuIJ:https://www.dictionary.com/browse/tree+&cd=29&hl=en&ct=clnk&gl=sk
https://www.treepeople.org/tree-benefits
#
https://webcache.googleusercontent.com/search?q=cache:4wLYFp4zTuUJ:https://www.treepeople.org/tree-benefits+&cd=30&hl=en&ct=clnk&gl=sk
EDIT: To filter results you can use this:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
user_input = input("Enter something to search: ")
print("googling.....")
google_search = requests.get("https://www.google.com/search?hl=en&q="+user_input, headers=headers) # <-- add headers and hl=en parameter
soup = BeautifulSoup(google_search.text , 'html.parser')
search_results = soup.select('.r a')
for link in search_results:
actual_link = link.get('href')
if actual_link.startswith('#') or \
actual_link.startswith('https://webcache.googleusercontent.com') or \
actual_link.startswith('/search?'):
continue
print(actual_link)
Prints (for example):
Enter something to search: tree
googling.....
https://en.wikipedia.org/wiki/Tree
https://simple.wikipedia.org/wiki/Tree
https://www.britannica.com/plant/tree
https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree
https://teamtrees.org/
https://www.ldoceonline.com/dictionary/tree
https://en.wiktionary.org/wiki/tree
https://www.dictionary.com/browse/tree
https://www.treepeople.org/tree-benefits