Search code examples
pythonweb-scrapingscrapyscrapy-splash

Can I scrape this kind of website?


I am learning python right now and I want to level up my knowledge on it particularly scraping. I am now on using Scrapy and getting in to use it along with Splash. I wanted to scrape a more challenging website - an airline website "https://www.airasia.com/en/home.page?cid=1" - one of my web developer friend told me that it would be impossible to scrape this type of websites since no regular json or xml files are returned for the data to be scrape. He said data can only be access using API (he said something about RESTFUL API) I don't somehow believe him. So as not wasting my time, if someone can CONFIRM it, I would be happy and if someone would say it can be scraped, I would be more happy if that guy can give me tips on how to scrape it and hands down if that guy can show proofs..

Many thanks.


Solution

  • Almost ANY website can be scraped but some websites are trickier than others.

    Instead of Scrapy, I would recommend using a better alternative called Selenium which happens to have a library for python as well.

    Long story made short: You will start a web browser in form of a driver and navigate to the page of your choice and simulate user interactions such as clicking, entering data in forms and submission. You will also be able to run JavaScript functions.

    You might also want to do some research on legal constraints to ensure your operation is not unlawful. For instance, refer to case law of Ryanair Ltd v PR Aviation BV (Case C-30/14 CJEU).