Search code examples
excelvbaweb-scrapingserverxmlhttp

The script works well when I use MSXML2.XMLHTTP.6.0, but it fails miserably when I switch to MSXML2.serverXMLHTTP.6.0


I've created a script in VBA to scrape breadcrumbs from a webpage using XML HTTP requests. The script works well when I implement the ActiveX component MSXML2.XMLHTTP.6.0, but it fails miserably when I switch to MSXML2.serverXMLHTTP.6.0.

As I've got a plan to use proxies within the script, it is necessary that I stick with MSXML2.serverXMLHTTP.6.0. However, the seond script doesn't work. To let you know, when I print the .responseText, I see gibberish content within it, as in the following:

??GN?!v:h??_??Og<]?????X ?6??'o??F??6 ?uh????x?r???????sP??????????[B??k????]??????yC????'???L???????,*?Z????? ?vX ?c?q\t?j??????K?|???P 7??k?y?<;?>????a?*P1????w???[?T?/f?? ?7?gn??V<E?Z??6t:??1??????E'v?1?? ?w??+??????-aD????wy?.

Using MSXML2.XMLHTTP.6.0 (works flawlessly):

Option Explicit
Sub GrabInfo()
    Const Url$ = "https://www.amazon.com/gp/product/B00FQT4LX2?th=1"
    Dim oHttp As Object, Html As HTMLDocument, breadCrumbs$

    Set Html = New HTMLDocument
    Set oHttp = CreateObject("MSXML2.XMLHTTP.6.0")

    With oHttp
        .Open "GET", Url, True
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        While .readyState < 4: DoEvents: Wend
        MsgBox "Status code: " & .Status
        Html.body.innerHTML = .responseText
        breadCrumbs = Html.querySelector("#wayfinding-breadcrumbs_feature_div")
        MsgBox breadCrumbs
    End With
End Sub

Using MSXML2.serverXMLHTTP.6.0 (throwing an error showing Object Variable or With block variable not set):

Option Explicit
Sub GrabInfo()
    Const Url$ = "https://www.amazon.com/gp/product/B00FQT4LX2?th=1"
    Dim oHttp As Object, Html As HTMLDocument, breadCrumbs$

    Set Html = New HTMLDocument
    Set oHttp = CreateObject("MSXML2.serverXMLHTTP.6.0")

    With oHttp
        .Open "GET", Url, True
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        While .readyState < 4: DoEvents: Wend
        MsgBox "Status code: " & .Status
        Html.body.innerHTML = .responseText
        breadCrumbs = Html.querySelector("#wayfinding-breadcrumbs_feature_div")
        MsgBox breadCrumbs
    End With
End Sub

How can I make the second script built on MSXML2.serverXMLHTTP.6.0 work?


Solution

  • I tried your code and got the same "glibberish" response. However, removing the statement setRequestHeader (where you set the user agent) solved that issue and the response was readable, while using a different User agent (the one suggested here) resulted in the strange response.

    Note that the result of Html.querySelector is an object, not a string and you should use

    Dim breadCrumbs as Object        
    Set breadCrumbs = Html.querySelector("#wayfinding-breadcrumbs_feature_div")
    MsgBox breadCrumbs.innerHTML