Search code examples
excelvbainnertext

Excel VBA extract InnerText after <a> tag


I am trying to extract some InnerText after a tag.

This is the HTML:

'<pre><a href="../">../</a>
<a href="view_10496.html">view_10496.html</a>     06-Feb-2021 01:54     60K
<a href="view_10498.html">view_10498.html</a>     06-Feb-2021 01:54     53K
<a href="view_10499.html">view_10499.html</a>     06-Feb-2021 01:54     26K
<a href="view_10500.html">view_10500.html</a>     06-Feb-2021 01:54     15K
<a href="view_10501.html">view_10501.html</a>     06-Feb-2021 01:54    128K

My code can pick up the content of the a tag but I also want to extract the text behind the a tag. The counter makes sure that I discard the first a tag.

Set alle_a_tags = ie.document.getElementsByTagName("a")

For Each a_tag In alle_a_tags
    
    If teller = 0 Then
        GoTo Volgende_a_tag
    End If

    InnerHTML = a_tag.InnerHTML
    InnerText = a_tag.InnerText
    Href = a_tag.Href
    Date = ...
Next

Solution

  • Based only on HTML provided:

    You can match the substring of the href attribute value with starts with operator to get right preceding nodes. You then need to move to the NextSibling to get desired text. You can use Select Case to determine which property to access depending on nodeType of that sibling

    Dim i As Long, nodes As Object, nextSibling As Object
    
    Set nodes = ie.document.querySelectorAll("[href^='view_']")
    
    For i = 0 To nodes.Length - 1
        Set nextSibling = nodes.Item(i).nextSibling
        'https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType
        Select Case nextSibling.NodeType
        Case 1
            Debug.Print nextSibling.innerText
        Case 3
            Debug.Print nextSibling.NodeValue
        End Select
    Next