I have an old Java program that used to get data from an html page, worked fines few years ago, now when I run it, there is no data. The page link is :
http://www.batstrading.com/book/ibm/
I can still see the html table got from my Java program, but there is no data, but if you use a browser to get to that page, you can see data dynamically changing, why ?
The html text I now get with my Java program from the page is like the text you can see from the browser's view source, looks like this :
<tbody>
<tr>
<td class="shares"> </td>
<td class="price"> </td>
</tr>
Instead of data, it is showing
How to fix my code to get the data ? What I mean is : there is nothing wrong with the Java program, it's getting the text just like the browser's view source, you don't see the data, because the page is now dynamic, so how to use Java to get data from a dynamic page is the question.
Scrap the current approach since the site is updated via Javascript. You won't be able to just download the HTML and make it work.
However, a much easier approach (than using Selenium or a JS engine) would be to simply request the source data that the Javascript is using to update the page:
http://www.batstrading.com/json/bzx/book/IBM
It's perfectly valid JSON. Request that link with your HTTP client and parse the JSON using Jackson. This will yield very reliable results.
Disclaimer You need to make sure that what you are doing complies with the Terms of Service on the website you are using. Otherwise you subject yourself to legal issues.