Search code examples
vb.nettreeview

Display results scraped from Webpage in Treeview Control from Class


I'm working on a Visual Basic Project. My working environment is :

  • Windows 10 32bit
  • Visual Studio 2015
  • .Net Framework 4.8
  • Winform

At this stage,I have :

  • Class (Class1.vb)
  • Form1 (Form1.vb) with TreeView Control

I'm supposed to Scrape a webpage (i.e: https://www.example.com), I want to display the result of Scraping in a Treeview Control placed on Form1. I have tried some approaches and they worked fine, except that they require using Webbrowser Control which I do not wish to use. I found a method that I'm using now, but it seems to not letting me display the Results on the Form.

Here is my Code of Class1.vb and it's working fine

    Imports System.Threading.Tasks
    Public Class Class1
        ' Create a WebBrowser instance.
        Private Event DocumentCompleted As WebBrowserDocumentCompletedEventHandler
        Private ManufacturersURi As New Uri("https://www.example.com/Webpage.php3")
        Public ManList As New List(Of TreeNode)
        Public Sub GettHelpPage()
            ' Create a WebBrowser instance.
            Dim webBrowserForPrinting As New WebBrowser() With {.ScriptErrorsSuppressed = True}
            ' Add an event handler that Scrape Data after it loads.
            AddHandler webBrowserForPrinting.DocumentCompleted, New _
            WebBrowserDocumentCompletedEventHandler(AddressOf GetManu_Name)
            ' Set the Url property to load the document.
            webBrowserForPrinting.Url = ManufacturersURi
        End Sub
        Private Sub GetManu_Name(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
            Dim webBrowserForPrinting As WebBrowser = CType(sender, WebBrowser)
            Dim Divs = webBrowserForPrinting.Document.Body.GetElementsByTagName("Div")
            ' Scrape the document now that it is fully loaded.
            Dim T As Task(Of List(Of TreeNode)) =
                 Task.Run(Function()
                      Dim LinksCount As Integer = 0
                      For Each Div As HtmlElement In Divs
                          If InStr(Div.GetAttribute("ClassName").ToString, "Div-Name", CompareMethod.Text) Then
                          LinksCount = Div.GetElementsByTagName("a").Count - 1
                          For I As Integer = 0 To LinksCount
                               Dim Txt() As String = Div.GetElementsByTagName("a").Item(I).InnerHtml.Split("<BR>")
                               Dim Manu_TreeNode As New TreeNode() With
                                              {.Name = I.ToString, .Text = Txt(0)}
                               ManList.Add(Manu_TreeNode)
                          Next
                          End If
                     Next
           Return ManList
        End Function)
' Dispose the WebBrowser now that the task is complete. 
Debug.WriteLine(T.Result.Count) 'Result is 116
webBrowserForPrinting.Dispose()
End Sub

The above Code results 116 TreeNodes, which are the count of Tags that I scraped. Now when I attempt to display this result on Form1_Load, nothing happens, because the Form loads before the Code finishes executing.

Here is the Form1_Load Code :

Public Class Form1
    Dim ThisClass As New Class1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        ThisClass.GetHelpPage()
        TreeView1.Nodes.Clear()
        For I As Integer = 0 To ThisClass.ManList.Count - 1
            TreeView1.Nodes.Add(ThisClass.ManList(I))
        Next
    End Sub
End Class

I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.

What am I doing wrong ? or What am I missing there ?


Solution

  • I noticed that if I placed an empty msgbox("") in the Form1_Load somewhere before For..Next, it forces the Form1_Load Event to wait and successfully populates the TreeView Control.

    Yes, it plays the await role if you keep it open long enough until the task in the GetManu_Name method is completed. Since the MsgBox is a modal window which blocks the next lines from being executed until it been closed.

    Now, either you make it a complete synchronous call by removing the Task.Run(...) from the GetManu_Name method, or utilize an asynchronous pattern in such a way as:

    Public Class WebStuff
    
        Public Shared Async Function ToTreeNodes(url As String) As Task(Of IEnumerable(Of TreeNode))
            Dim tcsNavigated As New TaskCompletionSource(Of Boolean)
            Dim tcsCompleted As New TaskCompletionSource(Of Boolean)
            Dim nodes As New List(Of TreeNode)
    
            Using wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
                AddHandler wb.Navigated,
                    Sub(s, e)
                        If tcsNavigated.Task.IsCompleted Then Return
                        tcsNavigated.SetResult(True)
                    End Sub
    
                AddHandler wb.DocumentCompleted,
                    Sub(s, e)
                        If wb.ReadyState <> WebBrowserReadyState.Complete OrElse
                        tcsCompleted.Task.IsCompleted Then Return
                        tcsCompleted.SetResult(True)
                    End Sub
    
                wb.Navigate(url)
    
                Await tcsNavigated.Task
                'Navigated.. if you need to do something here...
                Await tcsCompleted.Task
                'DocumentCompeleted.. Now we can process the Body...
    
                Dim Divs = wb.Document.Body.GetElementsByTagName("Div")
                Dim LinksCount As Integer = 0
    
                For Each Div As HtmlElement In Divs
                    If Div.GetAttribute("ClassName").
                        IndexOf("Div-Name", StringComparison.InvariantCultureIgnoreCase) > -1 Then
                        LinksCount = Div.GetElementsByTagName("a").Count - 1
                        For I As Integer = 0 To LinksCount
                            Dim Txt = Div.GetElementsByTagName("a").Item(I).InnerHtml.
                                Split({"<BR>"}, StringSplitOptions.RemoveEmptyEntries)
                            Dim n As New TreeNode With {
                                .Name = I.ToString, .Text = Txt.FirstOrDefault
                            }
                            nodes.Add(n)
                        Next
                    End If
                Next
            End Using
    
            Return nodes
        End Function
    
    End Class
    

    Notes on the method:

    You need to add the Async modifier to the caller's signature to call the function and wait for the result. For example, the Form.Load event:

    Private Async Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim nodes = Await WebStuff.ToTreeNodes("www....")
        TreeView1.Nodes.AddRange(nodes.ToArray)
    End Sub
    

    Or Async method:

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        PopulateTree()
    End Sub
    
    Private Async Sub PopulateTree()
        Dim nodes = Await WebStuff.ToTreeNodes("www....")
        TreeView1.Nodes.AddRange(nodes.ToArray)
    End Sub