Search code examples
pythonbeautifulsoupscrape

Empty list while scraping Google Search Result


I'm trying to scrape Google Search Result but all I'm getting as an output is empty list. Do you have any idea what's wrong here? I found the similar post on Stack Overflow where solution says you should try putting user_agent. I tried but it still returns nothing. Please share if you have any idea.

import requests, webbrowser
from bs4 import BeautifulSoup

user_input = input("Enter something to search:")
print("googling.....")

google_search = requests.get("https://www.google.com/search?q="+user_input)
# print(google_search.text)

soup = BeautifulSoup(google_search.text , 'html.parser')
# print(soup.prettify())

search_results = soup.select('.r a')
# print(search_results)

for link in search_results[:5]:
    actual_link = link.get('href')
    print(actual_link)
    webbrowser.open('https://google.com/'+actual_link)

Solution

  • To get results from Google page, you have to specify User-Agent http header. For english results, add hl=en parameter to search URL:

    import requests
    from bs4 import BeautifulSoup
    
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
    
    user_input = input("Enter something to search: ")
    print("googling.....")
    
    google_search = requests.get("https://www.google.com/search?hl=en&q="+user_input, headers=headers)  # <-- add headers and hl=en parameter
    
    soup = BeautifulSoup(google_search.text , 'html.parser')
    
    search_results = soup.select('.r a')
    
    for link in search_results:
        actual_link = link.get('href')
        print(actual_link)
    

    Prints:

    Enter something to search: tree
    googling.....
    https://en.wikipedia.org/wiki/Tree
    #
    https://webcache.googleusercontent.com/search?q=cache:wHCoEH9G9w8J:https://en.wikipedia.org/wiki/Tree+&cd=22&hl=en&ct=clnk&gl=sk
    /search?hl=en&q=related:https://en.wikipedia.org/wiki/Tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAVegQIAxAH
    https://simple.wikipedia.org/wiki/Tree
    #
    https://webcache.googleusercontent.com/search?q=cache:tNzOpY417g8J:https://simple.wikipedia.org/wiki/Tree+&cd=23&hl=en&ct=clnk&gl=sk
    /search?hl=en&q=related:https://simple.wikipedia.org/wiki/Tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAWegQIARAH
    https://www.britannica.com/plant/tree
    #
    https://webcache.googleusercontent.com/search?q=cache:91hg5d2649QJ:https://www.britannica.com/plant/tree+&cd=24&hl=en&ct=clnk&gl=sk
    /search?hl=en&q=related:https://www.britannica.com/plant/tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAXegQIAhAJ
    https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree
    #
    https://webcache.googleusercontent.com/search?q=cache:AVSszZLtPiQJ:https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree+&cd=25&hl=en&ct=clnk&gl=sk
    https://teamtrees.org/
    #
    https://webcache.googleusercontent.com/search?q=cache:gVbpYoK7meUJ:https://teamtrees.org/+&cd=26&hl=en&ct=clnk&gl=sk
    https://www.ldoceonline.com/dictionary/tree
    #
    https://webcache.googleusercontent.com/search?q=cache:oyS4e3WdMX8J:https://www.ldoceonline.com/dictionary/tree+&cd=27&hl=en&ct=clnk&gl=sk
    https://en.wiktionary.org/wiki/tree
    #
    https://webcache.googleusercontent.com/search?q=cache:s_tZIjpvHZIJ:https://en.wiktionary.org/wiki/tree+&cd=28&hl=en&ct=clnk&gl=sk
    /search?hl=en&q=related:https://en.wiktionary.org/wiki/tree+tree&tbo=1&sa=X&ved=2ahUKEwjmroPTuZLqAhVWWs0KHV4oCtsQHzAbegQICBAH
    https://www.dictionary.com/browse/tree
    #
    https://webcache.googleusercontent.com/search?q=cache:EhFIP6m4MuIJ:https://www.dictionary.com/browse/tree+&cd=29&hl=en&ct=clnk&gl=sk
    https://www.treepeople.org/tree-benefits
    #
    https://webcache.googleusercontent.com/search?q=cache:4wLYFp4zTuUJ:https://www.treepeople.org/tree-benefits+&cd=30&hl=en&ct=clnk&gl=sk
    

    EDIT: To filter results you can use this:

    import requests
    from bs4 import BeautifulSoup
    
    
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
    
    user_input = input("Enter something to search: ")
    print("googling.....")
    
    google_search = requests.get("https://www.google.com/search?hl=en&q="+user_input, headers=headers)  # <-- add headers and hl=en parameter
    
    soup = BeautifulSoup(google_search.text , 'html.parser')
    
    search_results = soup.select('.r a')
    
    for link in search_results:
        actual_link = link.get('href')
        if actual_link.startswith('#') or \
           actual_link.startswith('https://webcache.googleusercontent.com') or \
           actual_link.startswith('/search?'):
            continue
        print(actual_link)
    

    Prints (for example):

    Enter something to search: tree
    googling.....
    https://en.wikipedia.org/wiki/Tree
    https://simple.wikipedia.org/wiki/Tree
    https://www.britannica.com/plant/tree
    https://www.knowablemagazine.org/article/living-world/2018/what-makes-tree-tree
    https://teamtrees.org/
    https://www.ldoceonline.com/dictionary/tree
    https://en.wiktionary.org/wiki/tree
    https://www.dictionary.com/browse/tree
    https://www.treepeople.org/tree-benefits