Search code examples
excelvbaweb-scraping

Get parent element that contains string


I'm trying to get the class name of the div element that contains the information the calling areas between many div elements in the url. Since the div that contains the calling area information has USA states names, I'm using one state name as anchor to identify the div

This is my current code (Add Reference: Tools->Reference and check "Microsoft XML, v3.0")

Public Sub Main()
Dim url As String
Dim oHttp As New MSXML2.XMLHTTP
Dim divs As IHTMLElementCollection, div As HTMLDivElement
    
    
    url = "https://www.rebtel.com/en/international-calling-guide/phone-codes/us/"
    oHttp.Open "GET", url, False
    oHttp.send
    
    Dim html As New HtmlDocument
    html.body.innerHTML = oHttp.responseText

    Set divs = html.body.getElementsByTagName("div")
    
    For Each div In divs
        If div.innerHTML Like "*Alabama*" Then
            Debug.Print div.className
        End If
    Next div
    

End Sub

And current output has several divs that contains string "Alabama", since this site contains nested divs and one div could contain string "Alabama" and their children too.

content-wrapper
codes_show_view
container
row gap-l
gap-xl-bottom gap-l-top
pull-left
pull-left
pull-left
pull-left

Where my desired output would be the div with classname = gap-xl-bottom gap-l-top

The how to identify the specific div that contains the calling area codes classname = gap-xl-bottom gap-l-top?


Solution

  • You could do this, but I don't see how it can generalize to a different site...

    Public Sub Main()
    Dim url As String
    Dim oHttp As New MSXML2.XMLHTTP
    Dim divs As IHTMLElementCollection, div As Object, pDiv As Object
        
        
        url = "https://www.rebtel.com/en/international-calling-guide/phone-codes/us/"
        oHttp.Open "GET", url, False
        oHttp.send
        
        Dim html As New HtmlDocument
        html.body.innerHTML = oHttp.responseText
    
        Set divs = html.body.getElementsByTagName("div")
        
        For Each div In divs
            If div.innerHTML Like "*Alabama*" And div.className = "pull-left" Then
                Set pDiv = div.parentElement.parentElement 'the element two levels up in the DOM tree
                Debug.Print pDiv.className  '>> gap-xl-bottom gap-l-top
                Exit For
            End If
        Next div
    End Sub