Search code examples
pythonpython-3.xweb-scrapingbeautifulsouppython-requests

Scrape Google Quick Answer Box in Python


I am very new to Python programming and I am trying to make a simple application.

What I'm trying to do is search for a text on Google and return the links, my program does this fine. The other thing is if Google has the quick answer like in the photo below, I want to grab it, and this is where my problem lies. I tried searching online and found very few topics in which none of the codes work.

Google Quick box answer:

By examining the code of many pages I noticed that the answer is always in a class called _XWk but in Python when I get the code of the page and search for this class it doesn't find it. I tried so many ways of scraping the page in Python, but it never gets this class and I think the code it gets is less than the code the browser shows me when I open page source code.

Class _XWk:


Code:

import requests, lxml
from bs4 import BeautifulSoup

url = 'https://www.google.com/search?q=when%20was%20trump%20born'
h = {"User-Agent":"Chrome/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

r = requests.get(url, headers=h).text
soup = BeautifulSoup(r,'lxml')

soup.find_all("div", class_="_xwk")
print (soup)

Any help is appreciated.


Solution

  • The line soup.find_all("div", class_="_xwk") has no effect in your code. The find_all() function returns a list of tags that match the given parameters. So, you need to save this result in a variable.

    But, as you need only one such tag, you can use find(), which returns the first tag match.

    Finally, to get the text inside a tag, you've to use the .text attribute.

    Also, the class name is case sensitive. In the inspection, the class name is _XWk and not _xwk. Making these changes, the code:

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'}
    r = requests.get('https://www.google.com/search?q=when%20was%20trump%20born', headers=headers)
    soup = BeautifulSoup(r.text, 'lxml')
    
    result = soup.find('div', class_='_XWk')
    print(result.text)
    # 14 June 1946 (age 71)