Search code examples
excelvbaweb-scrapinghref

How to get the href attribute using excel vba


I want to retrieve the href attribute of the <h3> tags of an html page, I am able to get the innerText, but I don't know how to access the href attribute, there are several <h3> tags in the document, but for the time being I just need the first one. I will deal with the rest later...

This is the code I got so far

Sub Scrap()

Dim IE As New InternetExplorer
Dim sDD As String
Dim Doc As HTMLDocument

IE.Visible = True
IE.navigate "https://www.oneoiljobsearch.com/senior-reservoir-engineer-jobs/?page=1"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
sDD = Trim(Doc.getElementsByTagName("h3")(0).innerText) 
'sDD contains the string "Senior Reservoir Engineer"
End Sub

Below is a portion of the HTML document to extract data from:

  <div class="front_job_details">

    <h3>
        <a href="/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2" target="_blank">

        Senior Reservoir Engineer

        </a>
    </h3>

The text I need to retrieve is: "/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2"

Thanks in advance for your help.


Solution

  • Try,

    dim hr as string
    
    hr = Doc.getElementsByTagName("h3")(0).getElementsByTagName("a")(0).href
    
    debug.print hr
    

    The getElementsByTagName collection is zero-based but the .Length (the # of H3's, called Count in other methods) is one-based.

    dim i as long
    
    for i=0 to Doc.getElementsByTagName("h3").length - 1
        debug.print Doc.getElementsByTagName("h3")(i).getElementsByTagName("a")(0).href
    next i
    

    This gets the first <A> tag from each H3. You could duplicate the method to get multiple A's from each H3.