Search code examples
vbaindexingweb-scrapingihtmldocument

VBA GetElementsById Method "Object Variable Not Set"


I'm trying to select the main menu ID of this page http://greyhoundstats.co.uk/index.php labeled ("menu_wholesome") in order to get their hyperlinks later on. In the HTML document, there are two tags with this ID, a <div> and its child element <ul>, but when i search for them with the code below, i get the object variable not set" error.

Option Explicit

Public Const MenuPage As String = "http://greyhoundstats.co.uk/index.php"

Sub BrowseMenus()

Dim XMLHTTPReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim MainMenuList As MSHTML.IHTMLElement
Dim aElement As MSHTML.IHTMLElementCollection
Dim ulElement As MSHTML.IHTMLUListElement
Dim liElement As MSHTML.IHTMLLIElement

XMLHTTPReq.Open "GET", MenuPage, False
XMLHTTPReq.send

HTMLDoc.body.innerText = XMLHTTPReq.responseText

    Set MainMenuList = HTMLDoc.getElementById("menu_wholesome")(0) '<-- error happens here

End Sub

Anyone knows why getElementsById can't find the refered ID, although it is part of the HTML document set? I know that this method is supposed to return a unique ID, but when we have the same one refered by other tags i also know that i will return the first ID found which should be the <div id="menu_wholesome"> part of the HTML page being requested.


Solution

  • Firstly: You want to work and set the innerHTML as you intend to traverse a DOM document.

    Secondly: This line

    Set MainMenuList = HTMLDoc.getElementById("menu_wholesome")(0)
    

    It is incorrect. getElementById returns a single element which you cannot index into. You index into a collection.

    Please note: Both div and and ul lead to the same content.

    If you want to select them separately use querySelector

    HTMLDoc.querySelector("div#menu_wholesome")
    HTMLDoc.querySelector("ul#menu_wholesome")
    

    The above target by tag name first then the id attribute.

    If you want a collection of ids then use querySelectorAll to return a nodeList of matching items. Ids should be unique to the page but sometimes they are not!

    HTMLDoc.querySelectorAll("#menu_wholesome")
    

    You can then index into the nodeList e.g.

    HTMLDoc.querySelectorAll("#menu_wholesome").item(0)
    

    VBA:

    Option Explicit
    
    Public Const MenuPage As String = "http://greyhoundstats.co.uk/index.php"
    Sub BrowseMenus()
        Dim sResponse As String, HTMLDoc As New MSHTML.HTMLDocument
        Dim MainMenuList As Object, div As Object, ul As Object
    
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", MenuPage, False
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
        End With
    
        sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
        HTMLDoc.body.innerHTML = sResponse
    
        Set MainMenuList = HTMLDoc.querySelectorAll("#menu_wholesome")
    
        Debug.Print MainMenuList.Length
    
        Set div = HTMLDoc.querySelector("div#menu_wholesome")
        Set ul = HTMLDoc.querySelector("ul#menu_wholesome")
    
        Debug.Print div.outerHTML
        Debug.Print ul.outerHTML
    
    End Sub