Search code examples
vbaweb-scrapingselectors-apiqueryselector

Unable to use querySelector within querySelectorAll container in the right way


I'm trying to figure out how I can use .querySelector() on .querySelectorAll().

For example, I get expected results when I try like this:

Sub GetContent()
    Const URL$ = "https://stackoverflow.com/questions/tagged/web-scraping?tab=Newest"
    Dim HTMLDoc As New HTMLDocument
    Dim HTML As New HTMLDocument, R&, I&
    
    With New XMLHTTP60
        .Open "Get", URL, False
        .send
        HTMLDoc.body.innerHTML = .responseText
    End With

    With HTMLDoc.querySelectorAll(".summary")
        For I = 0 To .Length - 1
            HTML.body.innerHTML = .Item(I).outerHTML
            R = R + 1: Cells(R, 1).Value = HTML.querySelector(".question-hyperlink").innerText
        Next I
    End With
End Sub

The script doesn't work anymore when I pick another site in order to grab the values under Rank column available in the table even when I use the same logic:

Sub GetContent()
    Const URL$ = "https://www.worldathletics.org/records/toplists/sprints/100-metres/outdoor/men/senior/2020?page=1"
    Dim HTMLDoc As New HTMLDocument
    Dim HTML As New HTMLDocument, R&, I&

    With New XMLHTTP60
        .Open "Get", URL, False
        .send
        HTMLDoc.body.innerHTML = .responseText
    End With

    With HTMLDoc.querySelectorAll("#toplists tbody tr")
        For I = 0 To .Length - 1
            HTML.body.innerHTML = .Item(I).outerHTML
            R = R + 1: Cells(R, 1).Value = HTML.querySelector("td").innerText
        Next I
    End With
End Sub

This is the line Cells(R, 1).Value = HTML.querySelector().innerText In both the script I'm talking about. I'm using the same within this container .querySelectorAll().

If I use .querySelector() on .getElementsByTagName(), I found it working. I also found success using TagName on TagName or ClassName on ClassName e.t.c. So, I can grab the content in few different ways.

How can I use .querySelector() on .querySelectorAll() in the second script in order for it to work?


Solution

  • Wrap it in table tags so the html parser knows what to do with it.

    HTML.body.innerHTML = "<table>" & .Item(I).outerHTML & "</table>"
    

    Doing so preserves the structure of the opening td tag which is otherwise stripped of the "<".