Search code examples
excelvbaxmlhttprequestweb-crawlerserverxmlhttp

How to reset XMLHTTP connections in VBA


I'm trying to programme a VBA macro to scrape a bunch of data on power cuts in India. The macro is supposed to cycle through several hundreds of URLs generated in my excel file, and create an XMLHTTP request for each. For each URL, I'm also checking if current data is available, and I'm getting the most recent data where that's not possible.

Basically, whenever data isn't available, the website gives a response containing "Data is available for below date" and the dates for which data is available. I'm then using that string to generate a new link to the most recent data available. This way a formula is supposed to turn a link like: https://www.watchyourpower.org/reports.php?location_id=729&from_date=12%2F04%2F2020&to_date=12%2F05%2F2020

into something like: https://www.watchyourpower.org/reports.php?location_id=733&from_date=13%2F05%2F2018&to_date=13%2F06%2F2018

Pulling data from one URL at a time works, but when I'm trying to pull data from a URL generated as described above, I'm getting only the string that is still cached from the first request. How do I reset the XMLHTTP request so I can use an alternative URL / cycle through lots of URLs generated in my Excel file? I've spent the past few hours searching through forums but haven't really found anything.

Sorry if I'm overseeing something here. I'm not very experienced at coding and have stitched together my code from lots of different forum posts, including these two sites on stackoverflow: Extract table from webpage using VBA & VBA XMLHTTP clear authentication?

Here's my code:

Public Sub DataScraper()

Dim sResponse As String, html As HTMLDocument, clipboard, xmlhttp As Object

    Set html = New HTMLDocument
    Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
    With xmlhttp
        .Open "GET", ThisWorkbook.Sheets("Link Generator").Range("b3").Value, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    html.body.innerHTML = sResponse


If InStr(html.getElementsByTagName("span")(12).innerText, "Data is available for below date range") > 0 Then
Worksheets("Link Generator").Range("e3") = Right(html.getElementsByTagName("span")(12).innerText, 29)
Worksheets("Link Generator").Calculate


    Set html = New HTMLDocument
    Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
    With xmlhttp
        .Open "GET", ThisWorkbook.Sheets("Link Generator").Range("g3").Value, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    html.body.innerHTML = sResponse


End If


    Sheets.Add.Name = Sheets("Link Generator").Range("a3").Value

    With html
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .getElementsByTagName("table")(2).outerHTML
        clipboard.PutInClipboard
    End With

    Worksheets(Sheets("Link Generator").Range("a3").Value).Range("b4").PasteSpecial

This always results in an

Object variable or With block variable not set

error at line:

clipboard.SetText .getElementsByTagName("table")(2).outerHTML

Solution

  • For starts, just some general advice, you could make the bit where you create and send a request into its own single function that returns the html, then you can call it whenever needed, so you don't repeat your code and you don't risk using an existing object - something like:

    Public Function SendRequest(URL As String) As HTMLDocument
        Dim html As HTMLDocument
        Set html = New HTMLDocument
        Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
        With xmlhttp
            .Open "GET", URL, False
            .setRequestHeader "Cache-Control", "no-cache"
            .setRequestHeader "Pragma", "no-cache"
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
        End With
        html.body.innerHTML = sResponse
        SendRequest = html
    End Function
    
    Public Sub DataScraper()
    
        Dim html As HTMLDocument, clipboard, xmlhttp As Object
        Set html = SendRequest(ThisWorkbook.Sheets("Link Generator").Range("b3").Value)
    
        If InStr(html.getElementsByTagName("span")(12).innerText, "Data is available for below date range") > 0 Then
            Worksheets("Link Generator").Range("e3") = Right(html.getElementsByTagName("span")(12).innerText, 29)
            Worksheets("Link Generator").Calculate
            Set html = SendRequest(ThisWorkbook.Sheets("Link Generator").Range("g3").Value)
        End If
    
        Sheets.Add.Name = Sheets("Link Generator").Range("a3").Value
    
        With html
            Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
            clipboard.SetText .getElementsByTagName("table")(2).outerHTML
            clipboard.PutInClipboard
        End With
    
        Worksheets(Sheets("Link Generator").Range("a3").Value).Range("b4").PasteSpecial
    End Sub