Search code examples
htmlvbacollectionselementclassname

Why am I not able to add an HTML Classname to an Element Collection using MSXML2 with VBA


I have tried many proven methods from various posts to get some data from a web page without success. I am able to get a list of linked items on the opening page but once I navigate to any other page, I draw a blank with the code below.

When I run the code, I get no results in Cats.

Sub Main()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim Cats As MSHTML.IHTMLElementCollection
Dim Cat As MSHTML.IHTMLElement
Dim NextHref As String
Dim NextURL As String

XMLReq.Open "GET", URL, False
XMLReq.send

If XMLReq.Status <> 200 Then
    MsgBox "Problem"
    Exit Sub
End If

HTMLDoc.body.innerHTML = XMLReq.responseText
Set XMLReq = Nothing

Set Cats = HTMLDoc.getElementsByClassName("ng-tns-c329-5 product-grid--tile ng-star-inserted")

Debug.Print Cats.Length 'Returns 0

'For Each Cat In Cats
'    NextHref = Cat.getAttribute("href")
'    NextURL = URL & Mid(NextHref, InStr(NextHref, ":") + 2)
'    ListItemsInCats Cat.innerText, NextURL

'Next Cat

End Sub

Expanded Element structure

Collased structure

Thanks for any assistance.


Solution

  • The problem with the website you are trying to scrape from is that:

    In XMLHTTP Request method - The product details are dynamic content that is pulled from Fetch/XHR which XMLHTTP does not run, XMLHTTP only gives you the HTML document as it is without any script running.

    In Internet Explorer method - The webpage is considered ready before the product details are actually loaded so the usual loop check for Busy and ReadyState is not sufficient.

    The code below uses Internet Explorer and to resolve the issue mentioned above, I have put up some checks (Which is not perfect I believe but it works so far in my testing) that will wait until the first product has been loaded before proceeding to pull the product details:

    Private Sub GetBakeryProducts()
        Const URL As String = "https://www.woolworths.com.au/shop/browse/bakery"
        
        Dim ieObj As InternetExplorer
        Set ieObj = New InternetExplorer
        
        ieObj.navigate URL
        ieObj.Visible = True
        
        Do While ieObj.Busy Or ieObj.readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
        
        Do While ieObj.document.getElementsByClassName("productCarousel-header").Length = 0
            DoEvents
        Loop
            
        Dim ieDoc As MSHTML.HTMLDocument
        Set ieDoc = ieObj.document
    
        Dim productList As Object
        Set productList = ieDoc.getElementsByClassName("product-grid--tile")
            
        '==== Test if the website has finish loading the 1st product details
        On Error Resume Next
        Dim testStatus As String
        Do
            Err.Clear
            testStatus = productList(0).getElementsByClassName("shelfProductTile-descriptionLink")(0).innerText
        Loop Until Err.Number = 0
        '====
        
        Dim outputArr() As String
        ReDim outputArr(1 To productList.Length, 1 To 2) As String
        Dim outputIndex As Long
        
        Dim i As Long
        For i = 0 To productList.Length - 1
            If productList(i).getElementsByClassName("shelfProductTile-descriptionLink").Length <> 0 Then
                If Err.Number <> 0 Then
                    Err.Clear
                    Exit For
                End If
                
                Dim productName As String
                Dim productPrice As String
                
                productName = productList(i).getElementsByClassName("shelfProductTile-descriptionLink")(0).innerText
                productPrice = Replace(productList(i).getElementsByClassName("price")(0).innerText, vbNewLine, vbNullString)
                
                outputIndex = outputIndex + 1
                outputArr(outputIndex, 1) = productName
                outputArr(outputIndex, 2) = productPrice
            End If
        Next i
        
        ReDim Preserve outputArr(1 To outputIndex, 1 To 2) As String
        
        ieObj.Quit
        Set ieObj = Nothing
        
        ThisWorkbook.Sheets("Sheet1").Range("A1").Resize(outputIndex, UBound(outputArr, 2)).Value = outputArr
    End Sub
    

    Running this will pull the data from the website and paste the output starting from cell A1 in Sheet1, please change the worksheet name and range as you see fits.