Search code examples
pythonpython-2.7web-scrapingbeautifulsoupurllib2

BeautifulSoup4 not able to scrape data from table


I want to scrape the column 2 and 3 of table of site https://www.airvistara.com/fly/flightschedule the code i used is

import bs4 as bs
from urllib2 import urlopen

sauce=urlopen('https://www.airvistara.com/fly/flightschedule').read()
soup=bs.BeautifulSoup(sauce,'lxml')
table=soup.table
table_body=table.find('tbody')
table_rows=table_body.find_all('tr')
for tr in table_rows:
    td=tr.find_all('td')
    row=[i.text for i in td]
    print row

But i am not able to find the desired solution


Solution

  • The content you're trying to parse is loaded via ajax, which is not available to bs.
    Here's a working code to get the Outbound Flights on a python dictionary:

    import json
    import requests
    
    post_fields = {"flightDate":"22/04/2017"}
    headers = {'content-type': 'application/json'}
    url = 'https://www.airvistara.com/fly/getFlightschedule'
    json_response = requests.post(url, data=json.dumps(post_fields), headers=headers).text
    decoded_json = json.loads(json_response)
    print decoded_json
    

    Output:

    {u'flightSchedule': [{u'effectiveFrom': u'19-APR-2017', u'flightCode': u'UK 0946', u'baseFareL1': 0, u'flightDate': u'Saturday, 28 October 2017',...
    

    To get the details for each flight, you can use:

    for flight in decoded_json['flightSchedule']:
        print flight['effectiveFrom']
        print flight['flightCode']
        print flight['baseFareL1']
        print flight['flightDate']
        print flight['daysOfOperation']
        print flight['arrivalStation']
        print flight['departureStation']
        print flight['via']
        print flight['scheduledArrivalTime']
        print flight['departureCityName']
        print flight['effectiveTo']
        print flight['arrivalCityName']
        print flight['scheduledDepartureTime']
    

    Which will output something like:

    19-APR-2017
    UK 0946
    0
    Saturday, 28 October 2017
    Daily
    DEL
    AMD
    -
    10:25
    Ahmedabad
    28-OCT-2017
    New Delhi
    08:45
    

    Notes:
    1 - If you need to specify the arrivalStation or departureStation, use:

    post_fields = {"flightDate":"22/04/2017","arrivalStation":"AIRPORTCODE","departureStation":"AIRPORTCODE"}