I'm working on a Visual Basic Project. My working environment is :
At this stage,I have :
I'm supposed to Scrape a webpage (i.e: https://www.example.com), I want to display the result of Scraping in a Treeview Control placed on Form1. I have tried some approaches and they worked fine, except that they require using Webbrowser Control which I do not wish to use. I found a method that I'm using now, but it seems to not letting me display the Results on the Form.
Here is my Code of Class1.vb and it's working fine
Imports System.Threading.Tasks
Public Class Class1
' Create a WebBrowser instance.
Private Event DocumentCompleted As WebBrowserDocumentCompletedEventHandler
Private ManufacturersURi As New Uri("https://www.example.com/Webpage.php3")
Public ManList As New List(Of TreeNode)
Public Sub GettHelpPage()
' Create a WebBrowser instance.
Dim webBrowserForPrinting As New WebBrowser() With {.ScriptErrorsSuppressed = True}
' Add an event handler that Scrape Data after it loads.
AddHandler webBrowserForPrinting.DocumentCompleted, New _
WebBrowserDocumentCompletedEventHandler(AddressOf GetManu_Name)
' Set the Url property to load the document.
webBrowserForPrinting.Url = ManufacturersURi
End Sub
Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
Dim webBrowserForPrinting As WebBrowser = CType(sender, WebBrowser)
Dim Divs = webBrowserForPrinting.Document.Body.GetElementsByTagName("Div")
' Scrape the document now that it is fully loaded.
Dim T As Task(Of List(Of TreeNode)) =
Task.Run(Function()
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If InStr(Div.GetAttribute("ClassName").ToString, "Div-Name", CompareMethod.Text) Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt() As String = Div.GetElementsByTagName("a").Item(I).InnerHtml.Split("<BR>")
Dim Manu_TreeNode As New TreeNode() With
{.Name = I.ToString, .Text = Txt(0)}
ManList.Add(Manu_TreeNode)
Next
End If
Next
Return ManList
End Function)
' Dispose the WebBrowser now that the task is complete.
Debug.WriteLine(T.Result.Count) 'Result is 116
webBrowserForPrinting.Dispose()
End Sub
The above Code results 116 TreeNodes, which are the count of Tags that I scraped. Now when I attempt to display this result on Form1_Load, nothing happens, because the Form loads before the Code finishes executing.
Here is the Form1_Load Code :
Public Class Form1
Dim ThisClass As New Class1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
ThisClass.GetHelpPage()
TreeView1.Nodes.Clear()
For I As Integer = 0 To ThisClass.ManList.Count - 1
TreeView1.Nodes.Add(ThisClass.ManList(I))
Next
End Sub
End Class
I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.
What am I doing wrong ? or What am I missing there ?
I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.
Yes, it plays the await
role if you keep it open long enough until the task in the GetManu_Name
method is completed. Since the MsgBox is a modal window which blocks the next lines from being executed until it been closed.
Now, either you make it a complete synchronous call by removing the Task.Run(...)
from the GetManu_Name
method, or utilize an asynchronous pattern in such a way as:
Public Class WebStuff
Public Shared Async Function ToTreeNodes(url As String) As Task(Of IEnumerable(Of TreeNode))
Dim tcsNavigated As New TaskCompletionSource(Of Boolean)
Dim tcsCompleted As New TaskCompletionSource(Of Boolean)
Dim nodes As New List(Of TreeNode)
Using wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
AddHandler wb.Navigated,
Sub(s, e)
If tcsNavigated.Task.IsCompleted Then Return
tcsNavigated.SetResult(True)
End Sub
AddHandler wb.DocumentCompleted,
Sub(s, e)
If wb.ReadyState <> WebBrowserReadyState.Complete OrElse
tcsCompleted.Task.IsCompleted Then Return
tcsCompleted.SetResult(True)
End Sub
wb.Navigate(url)
Await tcsNavigated.Task
'Navigated.. if you need to do something here...
Await tcsCompleted.Task
'DocumentCompeleted.. Now we can process the Body...
Dim Divs = wb.Document.Body.GetElementsByTagName("Div")
Dim LinksCount As Integer = 0
For Each Div As HtmlElement In Divs
If Div.GetAttribute("ClassName").
IndexOf("Div-Name", StringComparison.InvariantCultureIgnoreCase) > -1 Then
LinksCount = Div.GetElementsByTagName("a").Count - 1
For I As Integer = 0 To LinksCount
Dim Txt = Div.GetElementsByTagName("a").Item(I).InnerHtml.
Split({"<BR>"}, StringSplitOptions.RemoveEmptyEntries)
Dim n As New TreeNode With {
.Name = I.ToString, .Text = Txt.FirstOrDefault
}
nodes.Add(n)
Next
End If
Next
End Using
Return nodes
End Function
End Class
Notes on the method:
Async
function to do the lengthy task and returns IEnumerable(Of TreeNode) to the caller.You need to add the Async
modifier to the caller's signature to call the function and wait for the result. For example, the Form.Load
event:
Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub
Or Async
method:
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
PopulateTree()
End Sub
Private Async Sub PopulateTree()
Dim nodes = Await WebStuff.ToTreeNodes("www....")
TreeView1.Nodes.AddRange(nodes.ToArray)
End Sub