I need to monitor the content of some webpages (to see if any information posted on them has changed). For various reasons, I have to use an Excel spreadsheet to do this, so that limits me to using VBA to do the webscraping. I've investigated as many options as I could find on how to do this, but I've run into several dead ends.
Here's what I've found or otherwise encountered so far:
At first, I tried running some webscraping code (using Excel VBA) to directly access the webpages in question, as shown in this snippet:
Dim http As Object
Dim html As MSHTML.HTMLDocument
Dim sRequest As String
Dim sResponse As String
Set html = New MSHTML.HTMLDocument
sRequest = ""
http.Open "GET", urlString, False
http.setRequestHeader "Content-Type", "application/xml"
http.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0"
http.Send sRequest
html.body.innerHTML = http.responseText
However, this doesn't work because the content on the pages that I need to scrape is inserted/appended dynamically into a <div>
element or <iframe>
after the rest of the page has been loaded. This means the initial response from the website doesn't contain the data that I'm looking for, and I don't how to access this added content (or even if it's even possible to do so) using only VBA.
The next option I explored was trying to control a browser so that I could access the content after it was loaded. However, the website I need to access doesn't support Internet Explorer (in any version), so any solution that relies on accessing an instance of Internet Explorer or Edge (in IE mode) isn't viable either.
There are solutions posted that allow for accessing and manipulating an instance of an Edge browser that doesn't involve IE mode, but you have to install specialized software like Selenium (or beautifulsoup, etc.) or make changes in the Registry to use them. The problem with these solutions is that I'll eventually have to give this spreadsheet to people who probably won't be permitted to add software or make low-level system changes (i.e., in the Registry) due to company security policies, so those options are not feasible either.
I've only found one option that might work, posted at the following page:
https://www.codeproject.com/Tips/5307593/Automate-Chrome-Edge-using-VBA
but there are problems with the code as-written, and there doesn't appear to be a way to reach the author of that code. It's described as being written for VBA, but the code is labelled as "VB.Net" (which raises questions in my mind) and it uses a data type that isn't supported in VBA, specifically the Dictionary
data type. I know I can create a dictionary using the command CreateObject("Scripting.Dictionary")
, but the variables that are declared as a Dictionary
don't appear to be getting key-value pairs assigned to them (unless there's something going on that I'm not seeing). Nevertheless, I tried to change how those variables are declared (like as Variants) and how the values are added to those dictionaries, but that only resulted in many other errors.
So my problem could be solved in one of two ways:
Dictionary
data type issue, such as by creating my own data type definition or object class, or if there's another way to declare those variables.Can anyone out there help?
For #2 you need to add a VBA project reference to the "Microsoft Scripting Runtime" library.
That gives you a Dictionary type. In a quick test it seemed to work.