Search code examples
excelvbaweb-scraping

Does AJAX Load More interrupt VBA web scraping?


I have a VBA module in Excel that's simply not returning any data. It calls up the page in question in IE but goes no further, and I'm not getting any error messages

The page loads 100 rows of the table first, then loads more. Could that be the problem?

Here's my code:

Sub TeamRank()
    Dim ieObj As InternetExplorer
    Dim htmlEle As IHTMLElement
    Dim i As Integer
    
    i = 1
    
    Set ieObj = New InternetExplorer
    ieObj.Visible = True
    ieObj.navigate "https://youthsoccerrankings.us/rankings/VA/14/Boys/"
    
    Application.Wait Now + TimeValue("00:00:10")
    
    
    For Each htmlEle In ieObj.document.getElementsByClassName("teamRow")(0).getElementsByTagName("tr")
        
        With ActiveSheet
            .Range("A" & i).Value = htmlEle.Children(0).textContent
            .Range("B" & i).Value = htmlEle.Children(1).textContent
            .Range("C" & i).Value = htmlEle.Children(2).textContent
        End With
        
        i = i + 1
        
    Next htmlEle
End Sub

Solution

  • Selection issue

    Seems like you were not looking at the right collection for your loop. The elements with class "teamRow" are already the <tr> elements, so you don't need to select them after that.

    For example, you can try this:

    Sub TeamRank()
        Dim ieObj As InternetExplorer
        Dim htmlEle As IHTMLElement
        Dim i As Integer
        
        i = 1
        
        Set ieObj = New InternetExplorer
        ieObj.Visible = True
        ieObj.Navigate "https://youthsoccerrankings.us/rankings/VA/14/Boys/"
        
        Application.Wait Now + TimeValue("00:00:10")
        
        
        For Each htmlEle In ieObj.Document.getElementsByClassName("teamRow")
            
            With ActiveSheet
                .Range("A" & i).Value = htmlEle.Children(0).textContent
                .Range("B" & i).Value = htmlEle.Children(1).textContent
                .Range("C" & i).Value = htmlEle.Children(2).textContent
            End With
            
            i = i + 1
            
        Next htmlEle
        
        Set ieObj = Nothing
        
    End Sub
    

    AJAX / Scrolling issue

    Regarding the Ajax update of the table, it does limit the download of the data to 100 rows. You could make Internet Explorer scroll to get the Ajax request to kick in and load the additional data. One way to do this would be like that:

    Sub TeamRank()
        Dim ieObj As InternetExplorer
        Dim htmlEle As IHTMLElement
        Dim i As Integer
        
        i = 1
        
        Set ieObj = New InternetExplorer 'CreateObject("InternetExplorer.Application")
        ieObj.Visible = True
        ieObj.Navigate "https://youthsoccerrankings.us/rankings/VA/14/Boys/"
        
        Do
            Application.Wait Now + TimeValue("00:00:01")
            DoEvents
        Loop Until ieObj.readyState = 4 'READYSTATE_COMPLETE
        
        ieObj.Document.parentWindow.Scroll 0, 9999
        ieObj.Refresh
        
        Do
            Application.Wait Now + TimeValue("00:00:01")
            DoEvents
        Loop Until ieObj.readyState = 4 'READYSTATE_COMPLETE
        
        For Each htmlEle In ieObj.Document.getElementsByClassName("teamRow")
            
            With ActiveSheet
                .Range("A" & i).Value = htmlEle.Children(0).textContent
                .Range("B" & i).Value = htmlEle.Children(1).textContent
                .Range("C" & i).Value = htmlEle.Children(2).textContent
            End With
            
            i = i + 1
            
        Next htmlEle
        
        Set ieObj = Nothing
        
    End Sub
    

    Explanations:

    Note first that I've replaced your wait time of 10 sec by a loop that waits for 1 second each time and checks if IE is loaded so you don't have to wait the full 10 sec every time as the code will resume as soon as IE is ready.

        'Excerpt from the code above:
        Do
            Application.Wait Now + TimeValue("00:00:01")
            DoEvents
        Loop Until ieObj.readyState = 4 'READYSTATE_COMPLETE
        
        ieObj.Document.parentWindow.Scroll 0, 9999
        ieObj.Refresh
        
        Do
            Application.Wait Now + TimeValue("00:00:01")
            DoEvents
        Loop Until ieObj.readyState = 4 'READYSTATE_COMPLETE
    

    You can see that after I've scrolled, I refresh the page. This is because during testing, I realised that IE doesn't seem to update its DOM object after the AJAX request so refreshing at the newly scrolled position is necessary so that the page will reload at the new location and ieObj will have access to all the rows.