Search code examples
phpjavascriptscreen-scraping

Following a Javascript link when scraping from a remote site using PHP


Given remote page:

http://example.com/paged_list.aspx

which uses a Javascript function call to display several pages of tabular data:

javascript: show_page(1) javascript: show_page(2)

and so on. Users click on the page links to display each page, which triggers a reload but with no query string, ie the URI remains the same.

In scraping this site, it would be useful to have a way to obtain subsequent pages but there is no obvious way to specify a page number in the request (passed to file_get_contents()).

Is there any way to:

  1. Open a remote web address.
  2. Call a known javascript function at that address.
  3. Return the results?

Solution

  • Emulating JS in PHP would be the tough route. Much easier to analyze the JS source and determine the URL target of the background AJAX operation. Should then be a fairly easy task to pull the entire data set into your PHP script by calling the URL and modifying args as needed.