Search code examples
web-scrapingbeautifulsouppython-requestspython-webbrowser

why is nothing getting parsed in my web scraping program?


I made this code to search all the top links in google search. But its returning none.

import webbrowser, requests
from bs4 import BeautifulSoup
string = 'selena+gomez'
website = f'http://google.com/search?q={string}'
req_web = requests.get(website).text
parser = BeautifulSoup(req_web, 'html.parser')
gotolink = parser.find('div', class_='r').a["href"]
print(gotolink)

Solution

  • Google needs that you specify User-Agent http header to return correct page. Without the correct User-Agent specified, Google returns page that doesn't contain <div> tags with r class. You can see it when you do print(soup) with and without User-Agent.

    For example:

    import requests
    from bs4 import BeautifulSoup
    
    string = 'selena+gomez'
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
    website = f'http://google.com/search?hl=en&q={string}'
    
    req_web = requests.get(website, headers=headers).text
    parser = BeautifulSoup(req_web, 'html.parser')
    gotolink = parser.find('div', class_='r').a["href"]
    print(gotolink)
    

    Prints:

    https://www.instagram.com/selenagomez/?hl=en