Search code examples
htmlvbaexcelgetelementsbytagnamegetelementsbyclassname

Excel VBA Web Scrape - getElementsbyTagName.Item Number isn't Constant


I've trying to web-scrape County, Population and Median Home Value using the following VBA code below. This works, but I'm encountering the following problem: The Item numbers (on block quotes below) vary, which means that for certain zip codes, I retrieve incorrect data points, see spreadsheet image below.

enter image description here

I've researched for variable Item numbers, but the only case that comes close is retrieving a collection, then picking out the element by class name. However, the HTML I'm working with doesn't have a class name, see below:

Is it possible to first find the string "County: " and then retrieve the innerText of TagName td? I've also considered returning the entire table, but the data are hosted within two different tables. As you can see, I have some ideas, but can't seem to find a good example to go off of, so I'm kinda lost on the code aspect of it.

enter image description here

Sub ZipCodeScrape()

Set ZipCodeRange = Range("C2", Range("C2").End(xlDown))

Dim IE As Object
Set IE = New InternetExplorer

Dim url As String
url = "https://www.unitedstateszipcodes.org/"

Dim County As String
Dim Population As String
Dim MedianHomeVal As String
Dim HTMLdoc As HTMLDocument

For Each cell In ZipCodeRange

    IE.navigate (url & cell.Value)

    'Allows IE to load
    While IE.readyState <> 4
        DoEvents
    Wend

    Set HTMLdoc = IE.document

    County = HTMLdoc.getElementsByTagName("td").Item(2).innerText  
    Population = HTMLdoc.getElementsByTagName("td").Item(6).innerText  
    MedianHomeVal = HTMLdoc.getElementsByTagName("td").Item(12).innerText

    cell.Offset(0, 1) = County
    cell.Offset(0, 2) = Population
    cell.Offset(0, 3) = MedianHomeVal

Next cell

End Sub

Solution

  • Try this. It will fetch you the expected output (county, population and median home value). I've set there few search options taken from your uploaded image. Btw, the search should be made with zip code otherwise the population and median home value doesn't show up in that webpage.

    Sub ZipCodeScrape()
        Dim IE As New InternetExplorer, html As HTMLDocument
        Dim search_input As Variant, posts As Object, post As Object, elem As Object
    
        With IE
            .Visible = True
            .navigate "https://www.unitedstateszipcodes.org/"
            Do Until .readyState = READYSTATE_COMPLETE: Loop
            Set html = .document
        End With
    
        Application.Wait Now + TimeValue("00:00:03")
    
        For Each search_input In [{"32937","33056","33312","33844","34698"}]
    
            html.getElementById("q").Value = search_input
            html.getElementsByClassName("btn btn-danger")(0).Click
            Application.Wait Now + TimeValue("00:00:05")
    
            For Each posts In html.getElementsByTagName("th")
                If InStr(posts.innerText, "County:") > 0 Then Row = Row + 1: Cells(Row, 1) = posts.NextSibling.innerText: Exit For
            Next posts
            For Each post In html.getElementsByTagName("th")
                If InStr(post.innerText, "Population") > 0 Then Cells(Row, 2) = post.ParentNode.getElementsByTagName("td")(0).innerText: Exit For
            Next post
            For Each elem In html.getElementsByTagName("th")
                If InStr(elem.innerText, "Median Home Value") > 0 Then Cells(Row, 3) = elem.ParentNode.getElementsByTagName("td")(0).innerText: Exit For
            Next elem
        Next search_input
        IE.Quit
    End Sub