Search code examples
excelvbaweb-scrapingxml-parsingxmlhttprequest

Optimizing HTML Parsing in VBA


I have some VBA code that makes a request to https://nt3-s.zacks.com/fundamreports/Default.aspx and extracts all the data to a worksheet. My issue is that my parsing technique takes a very long time. Does anyone have any suggestions on how I can improve what the code below does? Maybe I should utilize QuerySelectorAll() or some other method...

Sub Financials(ticker, SheetName As String)
Dim XMLPage As New MSXML2.XMLHTTP60
Dim RP1, RP2, QP1, QP2, QP3, QP4, QP5, QP6 As String
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLRow, HTMLCell As MSHTML.IHTMLElement
Dim i As Integer
Dim r, c, offset, shift As Integer

'Application.ScreenUpdating = False

shift = 50

' STANDARDIZED FINANCIAL STATEMENTS

' Generate income statements
'============================

XMLPage.Open "Post", "https://nt3-s.zacks.com/fundamreports/Default.aspx", False

' Header
XMLPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"

' Body Parameters
' Required parameters so every table value can be editted
RP1 = "__VIEWSTATE=%2FwEPDwUJNDg5NjgzODkyD2QWAgIDD2QWAgIBD2QWBAIPDxAPFgIeB0VuYWJsZWRnZGRkZAITDxAPFgIfAGdkZGRkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQhidG5FeGNlbHC%2FJVg5ST2AoHtph1XAsQL0bObX"

' String is too long to be typed in one line...
a1 = "%2FwEWdALpx6COCwKm07CqBwKSh%2B%2FQCgL96MW%2BBgLN6MW%2BBgLav7PDDQLCv%2FPADQLNv%2FPADQLcv%2FPADQLfv%2FPADQLev%2FPADQLZv%2FPADQLYv%2FPADQLbv%2FPADQLav%2FPADQLFv%2FPADQLduMOkAwLq0v%2FjCQLq0uO8AQLq0teZCgLq0rvyAwLq0q%2FPDALq0pOoBALq0oeFDQLq0uvhBgLq0p%2BJAwLq0oPiDALX%2B53NDwLX%2B4GmBwLX%2B%2FUCAtf72d8JAtf7zbgBAtf7sZUKAtf7pe4DAtf7icsMAtf7vfIJAtf7oc8CApOYyLkHApOYvBICk5ig7wkCk5iUyAICk5j4pAoCk5jsgQMCk5jQ2gwCk5jEtwQCk5jo3gICk5jcuwoC%2BKHuog0C%2BKHS%2FwYC%2BKHG2A8C%2BKGqtQcC%2BKGeDgL4oYLrCQL4ofbHAgL4odqgCgL4oY7ICAL4ofIkAuW2jIwDAuW28OgMAuW25MUFAsTR38AHAvO744cNAvO7%2F9gFAvO7y%2F0OAvO7p5YHAvO7s6sIAvO7j0wC87ub4QkC87v3hQIC87uD7QcC87ufhggCzpKBqQsCzpKdwgMCzpLp5gQCzpLFuw0CzpLR3AUCzpKt8Q4CzpK5igcCzpKVrwgCzpKhlg0CzpK9qwYCivHU3Q"
a2 = "MCivGg9gQCivG8iw0CivGIrAYCivHkwA4CivHw5QcCivHMvggCivHYUwKK8fS6BgKK8cDfDgLhyPLGCQLhyM6bAgLhyNq8CwLhyLbRAwLhyILqBALhyJ6PDQLhyOqjBgLhyMbEDgLhyJKsDALhyO7ABAL835DoBwL83%2ByMCAL83%2FihAQLnzLDSBAL4zLDSBAL5zLDSBAL6zLDSBAL8zLDSBAL9zLDSBAK3wpPeDgKFwpPeDgKuh8zbAwKhh8zbAwLR2eHwDAKFt7SHCcQGh8eHqGGqe%2F6U9Wz%2FpSk9FF8d"

RP2 = "&__EVENTVALIDATION=" & a1 & a2

' Query Parameters
QP1 = "&tbTicker=" & ticker
QP2 = "&ddBasis=Q"
QP3 = "&ddPeriod=0"
QP4 = "&ddFrom=" & CStr(Year(DateAdd("yyyy", -10, Date)))
QP5 = "&ddTo=" & CStr(Year(Date))
QP6 = "&ddFormType=0"

XMLPage.send RP1 & RP2 & QP1 & QP2 & QP3 & QP4 & QP5 & QP6

HTMLDoc.body.innerHTML = XMLPage.responseText

r = 1
c = 1
offset = 0 * shift

Sheets(SheetName).Select

Cells.Select
Selection.ClearContents

'Extract data from table
For Each HTMLRow In HTMLDoc.getElementsByTagName("tr")
    For Each HTMLCell In HTMLRow.getElementsByTagName("th")
        Cells(r, c + offset).value = HTMLCell.innerText
        c = c + 1
    Next HTMLCell
    For Each HTMLCell In HTMLRow.getElementsByTagName("td")
        Cells(r, c + offset).value = HTMLCell.innerText
        c = c + 1
    Next HTMLCell
c = 1
r = r + 1
Next HTMLRow

' Generate balance statements
'======================================
'Request
XMLPage.Open "Post", "https://nt3-s.zacks.com/fundamreports/Default.aspx", False

' Header
XMLPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"

QP6 = "&ddFormType=1"

XMLPage.send RP1 & RP2 & QP1 & QP2 & QP3 & QP4 & QP5 & QP6

HTMLDoc.body.innerHTML = XMLPage.responseText

r = 1
c = 1
offset = 1 * shift

'Extract data from table
For Each HTMLRow In HTMLDoc.getElementsByTagName("tr")
    For Each HTMLCell In HTMLRow.getElementsByTagName("th")
        Cells(r, c + offset).value = HTMLCell.innerText
        c = c + 1
    Next HTMLCell
    For Each HTMLCell In HTMLRow.getElementsByTagName("td")
        Cells(r, c + offset).value = HTMLCell.innerText
        c = c + 1
    Next HTMLCell
c = 1
r = r + 1
Next HTMLRow

EDIT: I changed the code to include more details about my program. The sub will collect the income statements, balance sheets, cash flows, and other metrics for multiple companies at once. I included a couple of instances of how I'm using the POST method to parse the data.


Solution

  • By calculating with every iteration, add Application.Calculation = xlManual to the beginning of your macro and Application.Calculation = xlAutomatic to the end which will prevent updating calculations after every passing loop.