Search code examples
javascripthtmlvbaexcelmshtml

HTML Anchor Elm's Hidden to VBA


I'm struggling to retrieve a set of links into a collection. All other page elements respond to the normal get commands, except these. This is where a knowledge of HTML and javascript would pay dividends. My own guess, is the fault is likely to do with the href being a javascript command, or that they are hidden behind a "clear", or "clearfix" class, rendering them hidden? My end goal is to be able to scrape the links from within the javascript hrefs.

Any help is appreciated. Thanks

Public Function getNewsMAIN()

Dim strURL As String: strURL = _
    "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/exchange-insight/company-news.html?fourWayKey=GB00BYN59130GBGBXSTMM"
Dim HTMLDoc As New HTMLDocument

Dim oXMLHTTP As Object
    Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
        oXMLHTTP.Open "GET", strURL, False
        oXMLHTTP.send
    If oXMLHTTP.Status = 200 Then
        HTMLDoc.body.innerHTML = oXMLHTTP.responseText
    Else: End If

'//Various attempts at cornering the links
Dim myLinks As IHTMLElementCollection
Dim myLink As IHTMLElement
    Set myLinks = HTMLDoc.getElementsByTagName("a") '("ul") ("li")
    Set myLinks = HTMLDoc.getElementsByClassName("newsArchive") '("newsContainer")
    Set myLink = HTMLDoc.getElementById("newsArchive")

End Function

The HTML in question. Links are contained within

<li class="newsContainer"></li>

There are 40 per page.

Sample HTML


Solution

  • Those links are part of additional content which is loaded to the page after the main page has loaded, so they won't be part of the MSXML content. If you want to get content from a page like this then your best bet would be to try automating IE to load the page, and then collecting the links once the full page has rendered.