Search code examples
excelvbaweb-scrapinggetelementsbytagname

Web Scraping - StockCharts - getElementsByTagName ("a")


I am trying to get the inner text and href attribute of the column Name at this website:

https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1

but I get all hyperlinks except the ones inside the table.

Can somebody please take a look at this code and let me know what is wrong?

Sub Scraping_StockCharts()

    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLIm As MSHTML.IHTMLElement
    Dim HTMLIms As MSHTML.IHTMLElementCollection
    Dim URL As String
    
    URL = "https://stockcharts.com/freecharts/sectorsummary.html?&G=SECTOR_DJUSNS&O=1"
    
    XMLPage.Open "Get", URL, False
    XMLPage.setRequestHeader "Content-Type", "text/xml"
    
    XMLPage.send
    
    HTMLDoc.body.innerHTML = XMLPage.responseText

    Row = 1
    
    Set HTMLIms = HTMLDoc.getElementsByTagName("a")
        
    For Each HTMLIm In HTMLIms
        
        Sheets("Results").Cells(Row, 2).Value = HTMLIm.innerText
        Sheets("Results").Cells(Row, 3).Value = HTMLIm.getAttribute("href")
        Row = Row + 1
        
    Next HTMLIm

End Sub

Solution

  • Scraping is not allowed via xmlhttp. Not sure about automating a browser. You will need to read terms of service carefully. With browser automation you could just use the url you have I suspect.

    From an intellectual point of view the data can be got from https://c.stockcharts.com/j-sum/sum?cmd=perf&group=SECTOR_DJUSNS which will return JSON. From that json you can reconstruct the url by accessing the sym value for each dictionary in the list of dictionaries returned. Concatenate that sym value onto the end of base string of https://stockcharts.com/h-sc/ui?s=

    e.g. for first dictionary in list

    https://stockcharts.com/h-sc/ui?s=   + sym 
    

    gives

    https://stockcharts.com/h-sc/ui?s=TKAT
    

    Basically, the server expects a query string and returns json. The page uses this to update content. This can be viewed in network tab of browser when refreshing page.

    You might be better off looking for a free API that serves similar data.