Search code examples
pythonscreen-scraping

Log Into Website and Scrape Streaming Data


I am not really a programmer but am asking this out of general curiosity. I visited a website recently where I logged in, went to a page, and without leaving, data on that page refreshes before my eyes.

Is it possible to mimic a browser (I was using Chrome) and log into the site, navigate to a page, and "scrape" that data that is coming in using Python? I would like to store and analyze it.

If so, taking this one step further, is it possible to interact with the website? Click a button that I know the name of?

Thanks in advance.


Solution

  • If the data "refreshes before your eyes" it is probably AJAX (javascript in the page pulling new page-data from the server).

    There are two ways of approaching this;

    1. using Selenium you can wrap an actual browser which will load the page, run the javascript, then you can grab page-bits from the active page.

    2. you can look at what the AJAX in the page is doing (how it is asking for updates, what it is getting back) and write python code to emulate that.

    both take a fair bit of of time and effort to set up; Selenium is a bit more robust, direct python queries is a bit more efficient, YMMV.