Search code examples
htmlexcelvbaweb-scrapingsimple-html-dom

How to pull the image and title of the product from Amazon?


I am trying to make a list of products based on the unique product codes of Amazon.

For example: https://www.amazon.in/gp/product/B00F2GPN36

Where B00F2GPN36 is the unique code.

I want to fetch the image and the title of the product into an Excel list under the columns product image and product name.

I have tried html.getElementsById("productTitle") and html.getElementsByTagName.

I also have doubt on what kind of variable to describe for storing the above mentioned info as I have tried declaration of Object type and HtmlHtmlElement.

I tried to pull the html doc and use it for the data search.

Code:

Enum READYSTATE
     READYSTATE_UNINITIALIZED = 0
     READYSTATE_LOADING = 1
     READYSTATE_LOADED = 2
     READYSTATE_INTERACTIVE = 3
     READYSTATE_COMPLETE = 4
End Enum

Sub parsehtml()

     Dim ie As InternetExplorer
     Dim topics As Object
     Dim html As HTMLDocument

     Set ie = New InternetExplorer
     ie.Visible = False
     ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"

     Do While ie.READYSTATE <> READYSTATE_COMPLETE
       Application.StatusBar = "Trying to go to Amazon.in...."
       DoEvents    
     Loop

     Application.StatusBar = ""
     Set html = ie.document
     Set topics = html.getElementsById("productTitle")
     Sheets(1).Cells(1, 1).Value = topics.innerText
     Set ie = Nothing

End Sub

I expect the output to be that in cell A1:
"Milton Thermosteel Carafe Flask, 2 litres, Silver" should reflect (without quotation marks) and similarly I want to pull the image as well.

But there is always some error like:
1. Run-time error '13':
Type mismatch when I used "Dim topics As HTMLHtmlElement"
2. Run-time error '438':
Object doesn't support this property or method

Note: I added references from Tools > References i.e. the required libraries.


Solution

  • Faster would be to use xhr and avoid browser and write out results from an array to sheet

    Option Explicit
    Public Sub GetInfo()
        Dim html As HTMLDocument, results()
        Set html = New HTMLDocument
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
            .send
            html.body.innerHTML = .responseText
            With html
                results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
            End With
        End With
        With ThisWorkbook.Worksheets("Sheet1")
            .Cells(1, 1) = results(0)
            Dim file As String
            file = DownloadFile("C:\Users\User\Desktop\", results(1))  'your path to download file
            With .Pictures.Insert(file)
                .Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
                .Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
                .Width = 75
                .Height = 100
                .Placement = 1
            End With
        End With
        Kill file
    End Sub