Search code examples
htmlvb6web-scraping

data not loaded fully in HTML


I am trying to create a scraper using vb6, my technique is to search the html page with get between 2 text function.

the function is tested and working correctly for all the sites, except a new site that I tried to use the same technique with it and failed.

The problem is the html is not showing the data, piece of the html as below:

<tr>
<td valign="top" nowrap="nowrap" class="label">Company Name:</td>
<td><span class="search-custom" id="synopsisDetailsOppNum"></span></td>
</tr>

the value should appear between the span tag above, but it's not appeared inside the HTML as above code.

The website is using javascript to manage the data.

I have tried also to use wait function, may the data appear with the HTML, but failed too.

Is there any solution to get the value, even with vb.net as I can update my code


Solution

  • that website is using JavaScript to add data to the webpage and such manipulation will not show up on the page source

    The follwoing is quoted from JavaScript & jQuery: The Missing Manual by David Sawyer McFarland

    One problem with using JavaScript to manipulate the DOM by adding, changing, deleting, and rearranging HTML code is that it’s hard to figure out what the HTML of a page looks like when JavaScript is finished. For example, the View Source command available in every browser only shows the web page file as it was downloaded from the web server. In other words, you see the HTML before it was changed by JavaScript, which can make it very hard to figure out if the JavaScript you’re writing is really producing the HTML you’re after. For example, if you could see what the HTML of your page looks like after your JavaScript adds 10 error messages to a form page, or after your JavaScript program creates an elaborate pop-up dialog box complete with text and form fields, it would be a lot easier to see if you’re ending up with the HTML you want. Fortunately, most major browsers offer a set of developer tools that let you view the rendered HTML—the HTML that the browser displays after JavaScript has done its magic. Usually the tools appear as a pane at the bottom of the browser window, below the web page. Different tabs let you access JavaScript code, HTML, CSS, and other useful resources. The exact name of the tab and method for turning on the tools panel varies from browser to browser: • In Firefox, install the Firebug plug-in (discussed on page 477). Open a page with the JavaScript code you wish to see and open Firebug (Tools→Firebug→Open Firebug). Click the HTML tab in the Firebug panel, and you’ll see the complete DOM (including any HTML generated by JavaScript). Alternatively, you can use the Web Developer toolbar in Firefox to view both the regular HTML source, and the generated HTML. • In IE 9, press the F12 key to open the Developer Tools panel, then click the HTML tab to see the page’s HTML. In the case of IE9, the HTML tab starts by showing the downloaded HTML (the same as the View Source command). But if you click the refresh icon (or press F5), the HTML tab shows the rendered HTML complete with any JavaScript-created changes. • In Chrome, select View→Developer→Developer Tools and click the Elements tab in the panel at the bottom of the browser window. • In Safari, make sure the Developer menu is on (choose Safari→Preferences, click the Advanced button, and make sure the “Show Develop menu in menu bar” is checked. Then open the page you’re interested in looking at, and choose Develop→Show Web Inspector. Click the Elements tab in the panel that appears at the bottom of the browser window. • In Opera, choose Tools→Advanced→Opera Dragonfly. (Dragonfly is the name of Opera’s built-in set of developer tools.) In the panel that appears at the bottom of the browser window, click the Documents tab.

    so the scraper won't download the page after the JavaScript finished it will get what it looks before any the JavaScript manipulation

    you can watch Michael Schrenk talking about Screen Scraper Tricks: Extracting Data from Difficult Websites

    http://www.youtube.com/watch?v=NtffxCi8aq4