Search code examples
htmlvbahttpweb-scraping

Scrape data from xmlhttp


I'm trying to scrape elements from xmlhttp.

I can import the html into a cell.

I would like to import specifically, the name, id, price and stock level.

Code to import the data:

Private Sub HTML_VBA_Excel()
    
    Dim oXMLHTTP    As Object
    Dim sPageHTML   As String
    Dim sURL        As String
    
    'Change the URL before executing the code
    sURL = "https://www.superdrug.com/Make-Up/Lips/Lip-Kits/Flower-Beauty-Mix-N%27-Matte-Lipstick-Duo-Tickled-Pink-687/p/769466"
        
    'Extract data from website to Excel using VBA
    Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    oXMLHTTP.Open "GET", sURL, False
    oXMLHTTP.send
    sPageHTML = oXMLHTTP.responseText
    
    'Get webpage data into Excel
    sh02.Cells(1, 1) = sPageHTML
    
End Sub

Solution

  • You cannot extract the information reliably from an xmlhttp request issued against the url you show as the content is javascript loaded and will not have run.

    Not sure how sustainable the token is (doesn't seem to matter the value used) but you can join the productid, which is the end of your url, with the ajax token present in the page and issue and xmlhttp request using querystring parameters and parse a json response for the items of interest. I use jsonconverter.bas. After downloading and installing the .bas you need to go VBE > Tools > References and add a reference to Microsoft Scripting Runtime.

    Some testing seems to indicate any number can be added after the hyphen in place of the token so you could randomly generate a number on the fly to use.

    It's worth noting you can comma separate multiple products in the query string and thus do a bulk request. You would need then do a For Each Loop over the collection of dictionaries returned.

    Option Explicit
    
    Public Sub GetInfo()
        Const URL As String = "https://www.superdrug.com/micrositeProduct/bulk/769466-1548702898380"
        Dim json As Object, title As String, price As String, stocking As String, id As String
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", URL, False
            .Send
            Set json = jsonconverter.ParseJson(.responsetext)(1)
        End With
    
        title = json("name")
        price = json("price")("formattedValue") 'json("price")("value")
        stocking = json("stockLevel")
        id = json("code")
    End Sub
    

    If you use a browser then the json string is present within one the script tags as the .innerHTML and you can easily extract from there.