I've trying to web-scrape County, Population and Median Home Value using the following VBA code below. This works, but I'm encountering the following problem: The Item numbers (on block quotes below) vary, which means that for certain zip codes, I retrieve incorrect data points, see spreadsheet image below.
I've researched for variable Item numbers, but the only case that comes close is retrieving a collection, then picking out the element by class name. However, the HTML I'm working with doesn't have a class name, see below:
Is it possible to first find the string "County: " and then retrieve the innerText of TagName td? I've also considered returning the entire table, but the data are hosted within two different tables. As you can see, I have some ideas, but can't seem to find a good example to go off of, so I'm kinda lost on the code aspect of it.
Sub ZipCodeScrape()
Set ZipCodeRange = Range("C2", Range("C2").End(xlDown))
Dim IE As Object
Set IE = New InternetExplorer
Dim url As String
url = "https://www.unitedstateszipcodes.org/"
Dim County As String
Dim Population As String
Dim MedianHomeVal As String
Dim HTMLdoc As HTMLDocument
For Each cell In ZipCodeRange
IE.navigate (url & cell.Value)
'Allows IE to load
While IE.readyState <> 4
DoEvents
Wend
Set HTMLdoc = IE.document
County = HTMLdoc.getElementsByTagName("td").Item(2).innerText
Population = HTMLdoc.getElementsByTagName("td").Item(6).innerText
MedianHomeVal = HTMLdoc.getElementsByTagName("td").Item(12).innerText
cell.Offset(0, 1) = County
cell.Offset(0, 2) = Population
cell.Offset(0, 3) = MedianHomeVal
Next cell
End Sub
Try this. It will fetch you the expected output (county, population and median home value). I've set there few search options taken from your uploaded image. Btw, the search should be made with zip code otherwise the population
and median home value
doesn't show up in that webpage.
Sub ZipCodeScrape()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim search_input As Variant, posts As Object, post As Object, elem As Object
With IE
.Visible = True
.navigate "https://www.unitedstateszipcodes.org/"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Set html = .document
End With
Application.Wait Now + TimeValue("00:00:03")
For Each search_input In [{"32937","33056","33312","33844","34698"}]
html.getElementById("q").Value = search_input
html.getElementsByClassName("btn btn-danger")(0).Click
Application.Wait Now + TimeValue("00:00:05")
For Each posts In html.getElementsByTagName("th")
If InStr(posts.innerText, "County:") > 0 Then Row = Row + 1: Cells(Row, 1) = posts.NextSibling.innerText: Exit For
Next posts
For Each post In html.getElementsByTagName("th")
If InStr(post.innerText, "Population") > 0 Then Cells(Row, 2) = post.ParentNode.getElementsByTagName("td")(0).innerText: Exit For
Next post
For Each elem In html.getElementsByTagName("th")
If InStr(elem.innerText, "Median Home Value") > 0 Then Cells(Row, 3) = elem.ParentNode.getElementsByTagName("td")(0).innerText: Exit For
Next elem
Next search_input
IE.Quit
End Sub