Search code examples
vbaselenium-webdriver

SeleniumBasic: VBA Webscrape FindElementByXPath grabs Class from the wrong tr block


My code scrapes the highlighted texts from here (see the code for the details of the website): Farnell

My code cycles through all three ProductRows to grab the appropriate MPN, Supplier_ID, QTY_InStock & Pkg Option. But what happens is the code does the first Product Row fine, then from the second row on it always returns Pkg Option of the First Row for both Row 2 & 3. Results are given by the MsgBox() at the end.

But why does it do that??? All the other attributes collected are the appropriate ones to the actual row.

The mistake happens at:

Prod_Pkg = Trim(ProductRow.FindElementByXPath("//*[@class='attachmentText cuttapePara']").text)

...in here:

Option Explicit

Sub Test()
    
    Dim driver As WebDriver
    Set driver = New Selenium.ChromeDriver
    
    driver.Start
    driver.Get ("https://uk.farnell.com")
    
    Dim Component_To_Find As String
    Component_To_Find = "C5750X7R2E105K230KA"   ' MPN - Manufacturer Part Number of a component
    
    Dim Farnell_Search_Box As WebElement
    Dim Farnell_Search_Button As WebElement
    
    Dim KeyPad As Selenium.keys
    Set KeyPad = New Selenium.keys
    
    Set Farnell_Search_Box = driver.FindElementByName("st")
    Set Farnell_Search_Button = driver.FindElementById("searchMain")
    
    Farnell_Search_Box.SendKeys (Component_To_Find)
    Farnell_Search_Button.submit
    
    Dim ProdList As WebElements
    Dim ProductRow As WebElement
    Dim InStock As String
    Dim Prod_Row_Comp_Name As String
    Dim Prod_Pkg As String
    Dim Supplier_ID As String
    'Dim Comp_Name As String
    Dim i As Integer
    
    'On Error Resume Next
    Set ProdList = driver.FindElementById("sProdList").FindElementByTag("tbody").FindElementsByTag("tr")
    'On Error GoTo 0
    If Not ProdList Is Nothing Then
        'For Each ProductRow In ProdList
        For i = 1 To ProdList.Count
            Set ProductRow = ProdList(i)
            
            'On Error Resume Next
            Prod_Row_Comp_Name = Trim(ProductRow.FindElementByTag("a").text)                                    ' Get MPN
            Prod_Pkg = Trim(ProductRow.FindElementByXPath("//*[@class='attachmentText cuttapePara']").text)     ' Get Packaging Option
            InStock = ProductRow.FindElementByClass("availability").FindElementByClass("inStockBold").text      ' Get QTY In Stock
            Supplier_ID = ProductRow.FindElementByClass("sku", -1, True).text                                   ' Get Supplier ID of Component
            'On Error GoTo 0
            
            MsgBox ("ProductRow:" & vbTab & i & vbCrLf & _
                    "MPN:" & vbTab & vbTab & Prod_Row_Comp_Name & vbCrLf & _
                    "Pkg:" & vbTab & vbTab & Prod_Pkg & vbCrLf & _
                    "QTY_InStock:" & vbTab & InStock & vbCrLf & _
                    "Supp_ID:" & vbTab & vbTab & Supplier_ID)
            
        Next i
    End If
End Sub

EDIT 1 With some help from here, I have tried a different method to get the WebElement of the failing readout.

By replacing this line:

Prod_Pkg = Trim(ProductRow.FindElementByXPath("//*[@class='attachmentText cuttapePara']").text)     ' Get Packaging Option

With this one:

Prod_Pkg = Trim(ProductRow.FindElementByXPath("//*[@class='pfIcon pfPackOptIcon_TC']//parent::div").text)     ' Get Packaging Option

It still doesn't read the appropriate Pkg Option to the actual ProductRow (Row 2 & 3). Row 2 & 3 always return the data from Row 1, which is "Cut Tape".

Edit 2

I have also tried this, from the same page as the above, an answer by someone else of how to get the parent of a WebElement:

Prod_Pkg = ProductRow.FindElementByXPath("//*[@class='pfIcon pfPackOptIcon_TC']").FindElementByXPath("..").text     ' Get Packaging Option

Still not working as expected.


Solution

  • I cannot really answer why FindElementByXPath finds the first entry of the whole page instead of only the table row. It's hard to find any documentation about that, the examples I found always use the command only on page level.

    However what works is to use FindElementByClass or FindElementsByClass. I prefer to use FindElementsByClass even if there is only one matching element - simply because it returns an empty list if no element was found while FindElementByClass throws a runtime error. Both functions find elements even if they have more than one class (in your example attachmentText and cuttapePara.
    I also prefer to write the result into intermediate variables before extracting the text. That helps to debug code and IMHO makes it more readable.

            Dim pkElems As WebElements
            Set pkElems = ProductRow.FindElementsByClass("cuttapePara")
            If pkElems.Count > 0 Then
                Prod_Pkg = pkElems(1).Text
            Else
                Prod_Pkg = "(not found)"
            End If