Search code examples
pythonweb-scrapingbeautifulsoupnonetype

.text issue when scraping Google Flgihts


Here's my code:

from bs4 import BeautifulSoup
import requests
import time

html_text = requests.get('https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSkZLEgoyMDIzLTA3LTAzcgcIARIDQU1TGh5qBwgBEgNBTVMSCjIwMjMtMDctMTNyBwgBEgNKRktwAYIBCwj___________8BQAFIAZgBAQ').text
soup = BeautifulSoup(html_text, 'lxml')
flights = soup.find_all('li', class_ = 'pIav2d')
for flight in flights:
    czas = flight.find('span', class_ = 'mv1WYe').text
    stops = flight.find('div', class_ = 'EfT7Ae AdWm1c tPgKwe').text
    cheapP = flight.find('div', class_ = 'YMlIz FpEdX jLMuyc').text
    Reg_price = flight.find('div', class_ = 'YMlIz FpEdX').text

    print(f'''
    Time: {czas}
    Stops: {stops}
    Cheapest: {cheapP}
    Regular Price: {Reg_price}
    ''')

The problem is Reg_price = flight.find('div', class_ = 'YMlIz FpEdX').text. When I add on the end .text I get error: 'NoneType' object has no attribute 'text'

And I know is the correct identifier because I'm using Xpath helper and when I run //div[@class='YMlIz FpEdX'] in Xpath helper, I get the correct results.

Edit: I figured out what is the problem and I need to write some sort of condition or if statement. And I need help with that.

Basically, flights = soup.find_all('li', class_ = 'pIav2d') sometimes has cheapP and Reg_price variables in it and sometimes doesn't, so I need to specify in the code that if it doesn't have that value just print None and still run the code till the end. What should my if statement look like?


Solution

  • In fact, you don't always have a value in the 'Reg_price' field yet. You need to process these two values in the try, except blocks as follows:

    from bs4 import BeautifulSoup
    import requests
    import time
    
    html_text = requests.get('https://www.google.com/travel/flights/search?tfs=CBwQAhoeagcIARIDSkZLEgoyMDIzLTA3LTAzcgcIARIDQU1TGh5qBwgBEgNBTVMSCjIwMjMtMDctMTNyBwgBEgNKRktwAYIBCwj___________8BQAFIAZgBAQ').text
    soup = BeautifulSoup(html_text, 'lxml')
    flights = soup.find_all('li', class_ = 'pIav2d')
    for flight in flights:
        czas = flight.find('span', class_ = 'mv1WYe').text
        stops = flight.find('div', class_ = 'EfT7Ae AdWm1c tPgKwe').text
        try:
            cheapP = flight.find('div', class_ = 'YMlIz FpEdX jLMuyc').text
        except AttributeError:
            cheapP = None
        try:
            Reg_price = flight.find('div', class_ = 'YMlIz FpEdX').text
        except AttributeError:
            Reg_price = None
    
        print(f'''
        Time: {czas}
        Stops: {stops}
        Cheapest: {cheapP}
        Regular Price: {Reg_price}
        ''')