I'm struggling to retrieve a set of links into a collection. All other page elements respond to the normal get commands, except these. This is where a knowledge of HTML and javascript would pay dividends. My own guess, is the fault is likely to do with the href being a javascript command, or that they are hidden behind a "clear", or "clearfix" class, rendering them hidden? My end goal is to be able to scrape the links from within the javascript hrefs.
Any help is appreciated. Thanks
Public Function getNewsMAIN()
Dim strURL As String: strURL = _
"http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/exchange-insight/company-news.html?fourWayKey=GB00BYN59130GBGBXSTMM"
Dim HTMLDoc As New HTMLDocument
Dim oXMLHTTP As Object
Set oXMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
oXMLHTTP.Open "GET", strURL, False
oXMLHTTP.send
If oXMLHTTP.Status = 200 Then
HTMLDoc.body.innerHTML = oXMLHTTP.responseText
Else: End If
'//Various attempts at cornering the links
Dim myLinks As IHTMLElementCollection
Dim myLink As IHTMLElement
Set myLinks = HTMLDoc.getElementsByTagName("a") '("ul") ("li")
Set myLinks = HTMLDoc.getElementsByClassName("newsArchive") '("newsContainer")
Set myLink = HTMLDoc.getElementById("newsArchive")
End Function
The HTML in question. Links are contained within
<li class="newsContainer"></li>
There are 40 per page.
Those links are part of additional content which is loaded to the page after the main page has loaded, so they won't be part of the MSXML content. If you want to get content from a page like this then your best bet would be to try automating IE to load the page, and then collecting the links once the full page has rendered.