Search code examples
vbaweb-scrapingreadystate

IE Readystate stuck at 1 during VBA webscraping


When attempting to reach a URL my code gets stuck in my readystate loop and never loads. The readystate remains at 1 permanently. If I pause the code and hit debug the cursor skips around my procedure in a bizarre order, sometimes to the end then to the beginning, sometimes right back to the beginning of the sub.

I read that this might be an issue with the javascript but I can't seem to find any resolution.

Is there a way to get this to work?

Sub Navigate()

    IE.Visible = True
    IE.Navigate ("http://web.vermont.org/Accounting?ysort=true")

    Do While IE.ReadyState <> 4
           DoEvents
    Loop


    Set Doc = IE.Document

End Sub

Solution

  • That server seems to respond quite nicely to XML requests and does not require that you move to subsequent pages for the remainder for the content.

    Sub Get_Listings()
        Dim sURL As String, iDIV As Long, htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
    
        Set xmlHTTP = New MSXML2.ServerXMLHTTP60
        Set htmlBDY = New HTMLDocument
    
        'sURL = "http://web.vermont.org/Accounting?ysort=true"
        sURL = "http://web.vermont.org/Dining?ysort=true"
    
    
        With xmlHTTP
            .Open "GET", sURL, False
            .setRequestHeader "Content-Type", "text/xml"
            .send
            Do While .readyState <> READYSTATE_COMPLETE: DoEvents: Loop
            If .Status <> 200 Then GoTo CleanUp
            htmlBDY.body.innerHTML = .responseText
        End With
    
        With htmlBDY
            For iDIV = 0 To (.getElementsByclassname("ListingResults_All_ENTRYTITLELEFTBOX").Length - 1)
                If CBool(.getElementsByclassname("ListingResults_All_ENTRYTITLELEFTBOX")(iDIV).getElementsByTagName("a").Length) Then
                    Debug.Print _
                      .getElementsByclassname("ListingResults_All_ENTRYTITLELEFTBOX")(iDIV).getElementsByTagName("a")(0).innertext
                End If
            Next iDIV
        End With
    
    CleanUp:
        Set htmlBDY = Nothing
        Set xmlHTTP = Nothing
    End Sub
    

    You will need Microsoft XML 6.0, Microsoft HTML Object Library and Microsoft Internet Controls added to Tools, References. I'm offering this snippet as I could find no Terms of Use on that site that banned the use of robotic scrapers. Be careful that you do not get your IP banned due to repetitive scraping requests.