Search code examples
vb.netwebclientvb.net-2010alexa-internet

Display particular part of website


I'm going to create a tool which displays webpage rank in VB.NET.
For that I use

Dim str As String = New WebClient().DownloadString(("http://www.alexa.com/siteinfo/" + TextBox1.Text))

And I just want the Global Rank of that url which I provided in textbox1.text

Like here, I provide example.com to check its Alexa global rank:

I just need to display the global ranking number in my VB form.


Solution

  • Plz take a look at this:

    Sub Main
        Dim str As String = New WebClient().DownloadString(("http://www.alexa.com/siteinfo/example.com"))
    
        Dim pattern = "a href=""/siteowners/certify.+?\>(?<rank>[0-9,]+?)\<\/a\>"
        Dim r = new Regex(pattern, RegexOptions.IgnoreCase)
        Dim m As Match = r.Match(str)
        If m.Success Then
            Debug.Print("Global rank " + m.Groups(1).ToString())
            m = m.NextMatch()
            Debug.Print("Usa rank " + m.Groups(1).ToString())       
        Else
            Debug.Print("Failed")
        End If
    End Sub
    

    On my computer answer is

    Global rank 8,893
    Usa rank 10,060
    

    This code need better error handling but I guess it is ok as a proof of concept.

    UPD. Some words on how it works:

    The code above uses regular expressions (please take a look at this link to get started) to parse web page and extract the values you need.

    On the screenshot you provided one can see that the ranks are stored in html <a> tag, which I identify by its href attribute, since it is the only <a> tag on the page, whose href attribute starts with string "/siteowners/certify". Hence, my regular expression matches for inner text of that tag and extracts it into match group.