I'm trying to programme a VBA macro to scrape a bunch of data on power cuts in India. The macro is supposed to cycle through several hundreds of URLs generated in my excel file, and create an XMLHTTP request for each. For each URL, I'm also checking if current data is available, and I'm getting the most recent data where that's not possible.
Basically, whenever data isn't available, the website gives a response containing "Data is available for below date" and the dates for which data is available. I'm then using that string to generate a new link to the most recent data available. This way a formula is supposed to turn a link like: https://www.watchyourpower.org/reports.php?location_id=729&from_date=12%2F04%2F2020&to_date=12%2F05%2F2020
into something like: https://www.watchyourpower.org/reports.php?location_id=733&from_date=13%2F05%2F2018&to_date=13%2F06%2F2018
Pulling data from one URL at a time works, but when I'm trying to pull data from a URL generated as described above, I'm getting only the string that is still cached from the first request. How do I reset the XMLHTTP request so I can use an alternative URL / cycle through lots of URLs generated in my Excel file? I've spent the past few hours searching through forums but haven't really found anything.
Sorry if I'm overseeing something here. I'm not very experienced at coding and have stitched together my code from lots of different forum posts, including these two sites on stackoverflow: Extract table from webpage using VBA & VBA XMLHTTP clear authentication?
Here's my code:
Public Sub DataScraper()
Dim sResponse As String, html As HTMLDocument, clipboard, xmlhttp As Object
Set html = New HTMLDocument
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
With xmlhttp
.Open "GET", ThisWorkbook.Sheets("Link Generator").Range("b3").Value, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = sResponse
If InStr(html.getElementsByTagName("span")(12).innerText, "Data is available for below date range") > 0 Then
Worksheets("Link Generator").Range("e3") = Right(html.getElementsByTagName("span")(12).innerText, 29)
Worksheets("Link Generator").Calculate
Set html = New HTMLDocument
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
With xmlhttp
.Open "GET", ThisWorkbook.Sheets("Link Generator").Range("g3").Value, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = sResponse
End If
Sheets.Add.Name = Sheets("Link Generator").Range("a3").Value
With html
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText .getElementsByTagName("table")(2).outerHTML
clipboard.PutInClipboard
End With
Worksheets(Sheets("Link Generator").Range("a3").Value).Range("b4").PasteSpecial
This always results in an
Object variable or With block variable not set
error at line:
clipboard.SetText .getElementsByTagName("table")(2).outerHTML
For starts, just some general advice, you could make the bit where you create and send a request into its own single function that returns the html, then you can call it whenever needed, so you don't repeat your code and you don't risk using an existing object - something like:
Public Function SendRequest(URL As String) As HTMLDocument
Dim html As HTMLDocument
Set html = New HTMLDocument
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
With xmlhttp
.Open "GET", URL, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
html.body.innerHTML = sResponse
SendRequest = html
End Function
Public Sub DataScraper()
Dim html As HTMLDocument, clipboard, xmlhttp As Object
Set html = SendRequest(ThisWorkbook.Sheets("Link Generator").Range("b3").Value)
If InStr(html.getElementsByTagName("span")(12).innerText, "Data is available for below date range") > 0 Then
Worksheets("Link Generator").Range("e3") = Right(html.getElementsByTagName("span")(12).innerText, 29)
Worksheets("Link Generator").Calculate
Set html = SendRequest(ThisWorkbook.Sheets("Link Generator").Range("g3").Value)
End If
Sheets.Add.Name = Sheets("Link Generator").Range("a3").Value
With html
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
clipboard.SetText .getElementsByTagName("table")(2).outerHTML
clipboard.PutInClipboard
End With
Worksheets(Sheets("Link Generator").Range("a3").Value).Range("b4").PasteSpecial
End Sub