Search code examples
excelvbaseleniuminternet-explorer-11autoit

Using Excel VBA to load a website that is incompatible with IE11


In Excel VBA to load a website and get it into a sheet I have been using the following:

Dim IE As Object  
Set IE = CreateObject("InternetExplorer.Application")  
IE .navigate "https://www.wsj.com/market-data/bonds/treasuries"

And then I can copy and paste it into my Excel sheet. But this website no longer works with IE11, and Excel VBA insists on using IE11 even though it is about to be deprecated.

Is there another way? I have also looked at:

  • Selenium: but it seems to be pretty much obsolete for VBA (not updated since 2016) and I couldn’t get it to work with Edge or Firefox in VBA anyway.

  • AutoIt: I got it to write the website’s HTML code to a TXT file (oHTTP = ObjCreate("winhttp.winhttprequest.5.1") ; $oHTTP.Open("GET", $URL1, False) ; $oHTTP.Send(); $oReceived = $oHTTP.ResponseText; FileWrite($file, $oReceived)) but the txt file contents are far from convenient as there is endless HTML stuff in it. It’ll take a fair amount of VBA code to sort through the mess, which probably means it won’t be reliable going forward. Also given the size of my workbook which is very slow, it will take literally several minutes to copy the website data into a sheet element by element.

Surely there must be an easy way to load the site, or just the table within the site, into an Excel sheet? This must be a well trodden path, but after much googling I can’t find an easy solution that actually works.

I have a 5-10 web pages being loaded into this workbook, and it seems to be a full time job keeping the whole thing working!! Any thoughts/help very much appreciated!!!


Solution

  • Similar idea to Christopher's answer in using regex. I am grabbing the instruments data (JS array), splitting the component dictionaries out (minus the end }), and then use regex, based on headers, to grab the appropriate values.

    I use a dictionary to handle input/output headers, and set a couple of request headers to help to signal browser based request and to mitigate for being served cached results.

    Ideally, one would use an html parser and grab the script tag, then use a json parser on the JavaScript object within the script tag.

    If you want the data from the other tabbed results, I can add that in by explicitly setting re.Global = True, then looping the returned matches. Depends whether you want those and how you want them to appear in the sheet(s).

    I currently write results out to a sheet called Treasury Notes & Bonds.


    Option Explicit
    
    Public Sub GetTradeData()
        Dim s As String, http As MSXML2.XMLHTTP60 'required reference Microsoft XML v6,
        
        Set http = New MSXML2.XMLHTTP60
    
        With http
            .Open "GET", "https://www.wsj.com/market-data/bonds/treasuries", False
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            s = .responseText
        End With
        
        Dim re As VBScript_RegExp_55.RegExp 'required reference Microsoft VBScript Regular Expressions
        
        Set re = New VBScript_RegExp_55.RegExp
        re.Pattern = "instruments"":\[(.*?)\]"
        s = re.Execute(s)(0).SubMatches(0)
        
        Dim headers() As Variant, r As Long, c As Long, mappingDict As Scripting.Dictionary 'required reference Microsoft Scripting Runtime
        
        Set mappingDict = New Scripting.Dictionary
        mappingDict.Add "maturityDate", "MATURITY"
        mappingDict.Add "coupon", "COUPON"
        mappingDict.Add "bid", "BID"
        mappingDict.Add "ask", "ASKED"
        mappingDict.Add "change", "CHG"
        mappingDict.Add "askYield", "ASKED YIELD"
        
        headers = mappingDict.keys
        
        Dim results() As String, output() As Variant, key As Variant
        
        results = Split(s, "}")
        ReDim output(1 To UBound(results), 1 To UBound(headers) + 1)
        
        For r = LBound(results) To UBound(results) - 1
            c = 1
            For Each key In mappingDict.keys
                re.Pattern = "" & key & """:""(.*?)"""
                output(r + 1, c) = re.Execute(results(r))(0).SubMatches(0)
                c = c + 1
            Next
        Next
        
        re.Pattern = "timestamp"":""(.*?)"""
        re.Global = True
        
        With ThisWorkbook.Worksheets("Treasury Notes & Bonds")
            
            .UsedRange.ClearContents
            
             Dim matches As VBScript_RegExp_55.MatchCollection
             
             Set matches = re.Execute(http.responseText)
            .Cells(1, 1) = matches(matches.Count - 1).SubMatches(0)
            .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
            .Cells(3, 1).Resize(UBound(output, 1), UBound(output, 2)) = output
        End With
    End Sub