I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:
<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>
There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.
jsoup is probably what you want, it excels at extracting data from an HTML document.
There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax
The process will be in two steps:
The code would look like this:
// let's find the iframe
Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
Elements elements = document.select("iframe");
Element iframe = elements.first();
// now load the iframe
URL iframeUrl = new URL(iframe.absUrl("src"));
document = Jsoup.parse(iframeUrl, 15000);
// extract the div
Element div = document.getElementById("number_forecast");