Search code examples
xmlvbaweb-scrapingxmlhttprequestmsxml2

Using getElementsByClassName with VBA


I wrote the following code in order to retrieve data from an API and hopefully get the content of a specific element. I wrote similar code that works with web pages (not API) but in this example it breaks and don't understand why.

Sub parseXML()
Dim xmldoc As Object
Dim obj As Object
Dim MyRequest As Object

Set MyRequest = CreateObject("MSXML2.XMLHTTP")
MyRequest.Open "GET", 
"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=ABL1%20Acute%20granulocytic%20leukemia"     
 MyRequest.send
 While Not MyRequest.readyState = 4
       DoEvents
    Wend
Set xmldoc = MyRequest.responseXML
Set obj = xmldoc.DocumentElement.getElementsByClassName("ui-ncbihistogram-display-area")(0) 'Error: object doesn't support this property or method 
End Sub 

How should I write the last line in order to remove the error? I am trying to get the element class="ui-ncbihistogram-display-area". It has several <li>elements and at the end I want to retrieve the inner text of each of these<li> elements.


Solution

  • You are trying to use the syntax that would work for an HTML Parser selecting off the pubmed search for the same term i.e. you would be selecting for the histogram on this page.

    However, with the API call you are making you are getting an XML document which will have specified fields according the API documentation. Instead, you would use xpath to specify what you want if using xml parser (this will allow you to include more specificity in terms of attributes and parent child relationships). If you are going to use HTML parser then you want to look at selecting by tag. The field present in the webpage, for the histogram, is not present in the response. Nor is that information present in the return xml document. The figure that sort of matches is you get the total result count for the query i.e. 248. Though it is broken down by year in the webpage histogram whereas it is a total in the XML API response.

    Sample comparison:

    enter image description here

    I would recommend using an XML parser for XML. You then get the expected result hopefully. Compare the two parsers here where there is loss of content in the HTMLDocument.

    Public Sub DifferentParsing()
        Dim xmlDoc As Object, htmlDoc As HTMLDocument, url As String
    
        Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
        Set htmlDoc = New HTMLDocument
        url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?%20db=pubmed&term=ABL1%20Acute%20granulocytic%20leukemia"
    
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", url, False
            .send
            htmlDoc.body.innerHTML = .responseText
            xmlDoc.LoadXML .responseXML.XML
        End With
        Debug.Print xmlDoc.SelectSingleNode("//Count").Text
        Debug.Print htmlDoc.getElementsByTagName("Count")(0).innerText
    End Sub