Search code examples
excelvbaweb-scrapingexcel-2007

How to get innerText of a tag in VBA excluding text from nested tags?


I am using VBA for web scraping. Below is the html structure and my VBA code.

When I run it I am getting this text ETA : 2020-08-26 (Reference only, the date will be updated according to shipments).

But I want to scrape only the date from it 2020-08-26

 <div style="font-size: 14px;">
     <span class="label" style="font-weight: bolder; font-size: 13px;">ETA : </span>
     <br>
     2020-08-26 
    <span style="color: red; font-size: 12px;">(Reference only, the date will be updated according to 
     shipments).</span>
</div>

VBA Code>

 Dim ie As New InternetExplorer
    Dim doc As New HTMLDocument
    
    ie.navigate "http://127.0.0.1/wordpress/sample-page/"
    Do
    DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE
    
    Set doc = ie.document
    Set elems = doc.getElementsByTagName("div")
    MsgBox elems(33).innerText

Solution

  • This code finds any date of the form ####-##-##.

    Cells.Clear
    s = "ETA : 2020-08-26 (Reference only, the date will be updated according to shipments)."
    ReDim a(1 To Len(s))
    For i = 1 To Len(s)
    a(i) = IIf(Mid(s, i, 1) Like "#", "#", Mid(s, i, 1))
    Next i
    fd = "####-##-##"
    Cells(1, 1) = s
    aa = Join(a, "")
    Cells(2, 1) = aa
    Cells(3, 1) = Mid(s, InStr(aa, fd), Len(fd))
    Cells(3, 1).NumberFormat = "yyyy-mm-dd"
    

    First it splits the string into an array, and replaces all digits with a #. Then it uses InStr to find a match to the pattern template fd, and uses the return value from the match to return the actual date.