Search code examples
vb.netwebtextinput

How to Post & Retrieve Data from Website


I am working with a Windows form application. I have a textbox called "tbPhoneNumber" which contains a phone number.

I want to go on the website http://canada411.com and enter in the number that was in my textbox, into the website textbox ID: "c411PeopleReverseWhat" and then somehow send a click on "Find" (which is an input belonging to class "c411ButtonImg").

After that, I want to retrieve what is in between the asterixs of the following HTML section:

<div id="contact" class="vcard">
        <span><h1 class="fn c411ListedName">**Full Name**</h1></span>
        <span class="c411Phone">**(###)-###-####**</span>
        <span class="c411Address">**Address**</span>
        <span class="adr">
            <span class="locality">**City**</span>
            <span class="region">**Province**</span>
            <span class="postal-code">**L#L#L#**</span>
        </span>

So basically I am trying to send data into an input box, click the input button and store the values retrieved into variables. I want to do this seemlessly so I would need to do something like an HTTPWebRequest? Or do I use a WebBrowser object? I just don't want the user to see that the application is going on a website.


Solution

  • I do a good amount of website scraping and I will show you how I do it. Feel free to skip ahead if I am being too specific, but this is a commonly requested theme and should be made specific.

    URL Simplification

    The library I use for this is htmlagilitypack (It is a dll, make a new project and add a reference to it). The first thing to check is if we have to go to take any special steps to get to a page by using a phone number. I searched for John Smith and found quite a few. I entered 2 of these results and noticed that the url formatting is very simple. Those results were..

    http://www.canada411.ca/res/7056736767/John-Smith/138223109.html

    http://www.canada411.ca/res/7052355273/John-Smith/172439951.html

    I tested to see if I can remove some of the values from the url that I don't know and just leave the phone number. The result was that I can...

    http://www.canada411.ca/search/re/1/7056736767/-

    http://www.canada411.ca/search/re/1/7052355273/-

    You can see by the url that there are some static areas in the url and our phone number. From this lets construct a string for the url.

    Dim phoneNumber as string = "7056736767" 'this could be TextBox1.Text or whatever
    Dim URL as string = "http://www.canada411.ca/search/re/1/" + phoneNumber +"/-"
    

    Value Extraction with XPath

    Now that we have the page dialed in, lets examine the html you provided above. You need 6 values from the page so we will create them now...

    Dim FullName As String
    Dim Phone As String
    Dim Address As String
    Dim Locality As String
    Dim Region As String
    Dim PostalCode As String  
    

    As mentioned above, we will be using htmlagilitypack which uses Xpath. The cool thing about this is that once we can find some unique identifier in the html, we can use Xpath to find our values. I know it may be confusing, but it will become clearer.

    All of the values you need are within tags that have a class name. Lets use the class name in our Xpath to find them.

    Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
    Dim PhoneXPath  As String = "//*[@class='c411Phone']"
    Dim AddressXPath  As String = "//*[@class='c411Address']"
    Dim LocalityXPath  As String = "//*[@class='locality']"
    Dim RegionXPath  As String = "//*[@class='region']"
    Dim PostalCodeXPath  As String = "//*[@class='postal-code']"
    

    Essentially what we are looking at is a string that will inform htmlagilitypack what to look for. In our case, text contained within the classes we named. There is a lot to XPath and it could take a while to explain all of it. On a side note though...If you use Google Chrome and highlight a value on a page, you can right click inspect element. In the code that appears below, you can right click the value and copy to XPath!!! Very useful.

    Basic HTMLAgilityPack Template

    Now, all that is left is to connect to the page and get those variables populated.

    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Doc = Web.Load(URL)
    For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
         Msgbox(nameResult.InnerText)        
    Next
    

    In the above example we create an HtmlWeb object named Web. This is the actual crawler of our project. We then define a HtmlDocument which will consist of our converted and searchable page source. All of this is done behind the scenes. We then send Web to get the page source and assign it to the Doc object we created. Doc is reusable, which thankfully requires us to connect to the page only once.

    The for loop looks for any nodes in our Doc that match FullNameXPath which was defined previously as the XPath value for finding the name. When a Node is found, it is assigned to the nameResult variable and from within the loop we call a message box to display the inner text of our node.

    So when we put it all together

    Complete Working Code (As of 2/17/2013)

    Dim phoneNumber As String = "7056736767" 'this could be TextBox1.Text or whatever
    Dim URL As String = "http://www.canada411.ca/search/re/1/" + phoneNumber + "/-"
    Dim FullName As String
    Dim Phone As String
    Dim Address As String
    Dim Locality As String
    Dim Region As String
    Dim PostalCode As String
    Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
    Dim PhoneXPath As String = "//*[@class='c411Phone']"
    Dim AddressXPath As String = "//*[@class='c411Address']"
    Dim LocalityXPath As String = "//*[@class='locality']"
    Dim RegionXPath As String = "//*[@class='region']"
    Dim PostalCodeXPath As String = "//*[@class='postal-code']"
    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Doc = Web.Load(URL)
    For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
        FullName = nameResult.InnerText
        MsgBox(FullName)
    Next
    For Each PhoneResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PhoneXPath)
        Phone = PhoneResult.InnerText
        MsgBox(Phone)
    Next
    For Each ADDRResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(AddressXPath)
        Address = ADDRResult.InnerText
        MsgBox(Address)
    Next
    For Each LocalResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(LocalityXPath)
        Locality = LocalResult.InnerText
        MsgBox(Locality)
    Next
    For Each RegionResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(RegionXPath)
        Region = RegionResult.InnerText
        MsgBox(Region)
    Next
    For Each postalCodeResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PostalCodeXPath)
        PostalCode = postalCodeResult.InnerText
        MsgBox(PostalCode)
    Next