I am trying to get the inner text and href attribute of the column Name at this website:
https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1
but I get all hyperlinks except the ones inside the table.
Can somebody please take a look at this code and let me know what is wrong?
Sub Scraping_StockCharts()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLIm As MSHTML.IHTMLElement
Dim HTMLIms As MSHTML.IHTMLElementCollection
Dim URL As String
URL = "https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1"
XMLPage.Open "Get", URL, False
XMLPage.setRequestHeader "Content-Type", "text/xml"
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Row = 1
Set HTMLIms = HTMLDoc.getElementsByTagName("a")
For Each HTMLIm In HTMLIms
Sheets("Results").Cells(Row, 2).Value = HTMLIm.innerText
Sheets("Results").Cells(Row, 3).Value = HTMLIm.getAttribute("href")
Row = Row + 1
Next HTMLIm
End Sub
Scraping is not allowed via xmlhttp. Not sure about automating a browser. You will need to read terms of service carefully. With browser automation you could just use the url you have I suspect.
From an intellectual point of view the data can be got from https://c.stockcharts.com/j-sum/sum?cmd=perf&group=SECTOR_DJUSNS
which will return JSON. From that json you can reconstruct the url by accessing the sym value for each dictionary in the list of dictionaries returned. Concatenate that sym
value onto the end of base string of https://stockcharts.com/h-sc/ui?s=
e.g. for first dictionary in list
https://stockcharts.com/h-sc/ui?s= + sym
gives
https://stockcharts.com/h-sc/ui?s=TKAT
Basically, the server expects a query string and returns json. The page uses this to update content. This can be viewed in network tab of browser when refreshing page.
You might be better off looking for a free API that serves similar data.