Search code examples
vbaexcelweb-scrapinginternet-explorer-11

Can't fetch the titles from a webpage


I've written a script in vba in combination with IE to get the titles of diferent charts from a webpage but I'm not being able to. It seems I've used right class names along with tag names to reach the content but no dice. It doesn't throw any error either.

This is my approach so far:

Sub GetTitle()
    Const Url As String = "https://www.fbatoolkit.com/"
    Dim IE As New InternetExplorer, Html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate Url
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set Html = .document
    End With

    Application.Wait Now + TimeValue("00:00:05")

    For Each post In Html.getElementsByClassName("chart")
        With post.getElementsByTagName("text")
          If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
        End With
    Next post
End Sub

The titles are like below which are visible above each charts:

Toys & Games
Health & Household

I'm not expecting any solution related to selenium. Thanks.


Solution

  • This is a bit of cheat to be honest. Consider it a placeholder until I find a better way as I am guessing you specifically want to access those titles.

    Option Explicit
    Public Sub GetInfo()
        Dim ie As New InternetExplorer, html As HTMLDocument, titles(), i As Long
        With ie
            .Visible = True
            .navigate "https://www.fbatoolkit.com/"
            While .Busy Or .readyState < 4: DoEvents: Wend
            Set html = .document
            titles = GetTitles(html.body.innerHTML, "id=""visualization([^""]*)")
            For i = LBound(titles) To UBound(titles)
                Debug.Print titles(i)
            Next
            .Quit '<== Remember to quit application
        End With
    End Sub
    
    Public Function GetTitles(ByVal inputString As String, ByVal sPattern As String) As Variant
        Dim Matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
        With CreateObject("vbscript.regexp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .Pattern = sPattern
            If .test(inputString) Then
                Set Matches = .Execute(inputString)
                For Each iMatch In Matches
                    If iMatch.SubMatches(0) <> vbNullString Then
                        ReDim Preserve arrMatches(i)
                        arrMatches(i) = Replace$(Replace$(iMatch.SubMatches(0), Chr$(95), Chr$(32)), Chr$(32) & Chr$(32), Chr$(32) & Chr$(38) & Chr$(32))
                        i = i + 1
                    End If
                Next iMatch
            End If
        End With
        GetTitles = arrMatches
    End Function