I want to scrape the column 2 and 3 of table of site https://www.airvistara.com/fly/flightschedule the code i used is
import bs4 as bs
from urllib2 import urlopen
sauce=urlopen('https://www.airvistara.com/fly/flightschedule').read()
soup=bs.BeautifulSoup(sauce,'lxml')
table=soup.table
table_body=table.find('tbody')
table_rows=table_body.find_all('tr')
for tr in table_rows:
td=tr.find_all('td')
row=[i.text for i in td]
print row
But i am not able to find the desired solution
The content you're trying to parse is loaded via ajax
, which is not available to bs
.
Here's a working code to get the Outbound Flights on a python dictionary:
import json
import requests
post_fields = {"flightDate":"22/04/2017"}
headers = {'content-type': 'application/json'}
url = 'https://www.airvistara.com/fly/getFlightschedule'
json_response = requests.post(url, data=json.dumps(post_fields), headers=headers).text
decoded_json = json.loads(json_response)
print decoded_json
Output:
{u'flightSchedule': [{u'effectiveFrom': u'19-APR-2017', u'flightCode': u'UK 0946', u'baseFareL1': 0, u'flightDate': u'Saturday, 28 October 2017',...
To get the details for each flight, you can use:
for flight in decoded_json['flightSchedule']:
print flight['effectiveFrom']
print flight['flightCode']
print flight['baseFareL1']
print flight['flightDate']
print flight['daysOfOperation']
print flight['arrivalStation']
print flight['departureStation']
print flight['via']
print flight['scheduledArrivalTime']
print flight['departureCityName']
print flight['effectiveTo']
print flight['arrivalCityName']
print flight['scheduledDepartureTime']
Which will output something like:
19-APR-2017
UK 0946
0
Saturday, 28 October 2017
Daily
DEL
AMD
-
10:25
Ahmedabad
28-OCT-2017
New Delhi
08:45
Notes:
1 - If you need to specify the arrivalStation
or departureStation
, use:
post_fields = {"flightDate":"22/04/2017","arrivalStation":"AIRPORTCODE","departureStation":"AIRPORTCODE"}