Search code examples
vbaseleniumweb-scrapingselenium-chromedriverwait

How to optimize the wait method using VBA and Chromedriver


in this homepage "http://www.kpia.or.kr/index.php/year_sugub"

If you check the html, there are 6 id's from li1 to li6. The first thing I noticed after using chromedriver for the first time was that the wait method was ineffective. So I searched for various ways to optimize the wait after clicking on the internet for use on this homepage. For example, I've applied the following three kinds of coding.

ex1) Application.Wait Now + TimeSerial (0, 0, 5)    

ex2) .FindElementById ("li2", timeout: = 10000) .Click

ex3) 'Do 'DoEvents 'On Error Resume Next 'Set ele = .FindElementById ("li2") 'On Error GoTo 0 'If Timer - t = 10 Then Exit Do' <== To avoid infinite loop 'Loop While ele Is Nothing

However, we could not finally find a way to optimize the wait method without using Application.Wait Now + TimeSerial (0, 0, 5). This method is not fully loaded after clicking li2, but occasionally additional tasks are executed.

So, I thought of a formal coding logic that I could use occasionally to do similar coding in the future, and I came up with the following logic. For example, in li2, the Ethylene value is always a fixed value with the result value, so if you click on li2 and then look for the "SM" value, the data will be loaded into the sheet. Next, "LDPE" in li3 is the way to paste the data into the sheet after loading is complete. So I am coding with this idea, and I can not solve the error while I'm working on VBA.

Dim d As WebDriver, ws As Worksheet, clipboard As Object
Set d = New ChromeDriver
Set ws = ThisWorkbook.Worksheets("Sheet3")
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
Const URL = "http://www.kpia.or.kr/index.php/year_sugub"
Dim html As HTMLDocument

Set html = New HTMLDocument

With d
    .AddArgument "--headless"
    .Start "Chrome"
    .get URL, Raise:=False
rep:
    .FindElementById("li2", timeout:=10000).Click

    Dim Posts As WebElements
    Dim elem As WebElements
    Dim a1 As Integer

    For Each Posts In .FindElementsByClass("bbs")
        For Each elem In Posts.FindElementsByCss("td")
            If Not elem.Text = "SM" Is Nothing Then

html.body.innerHTML = d.PageSource

Dim tarTable As HTMLTable
Dim hTable As HTMLTable

For Each tarTable In html.getElementsByTagName("table")
    If InStr(tarTable.className, "bbs") <> 0 Then
    Set hTable = tarTable
    End If
Next

    clipboard.SetText .FindElementById("table_body").Attribute("outerText")
    clipboard.PutInClipboard

    else
    goto rep
    end if
    .Quit

End With

If it finds a value that matches the SM value, it assumes that the loading is completed and proceeds to transfer the related data to the clipboard. If the SM value is not found, use GOTO to use .FindElementById ("li2" timeout: = 10000). I thought I could fix it by creating a loop that restarts from .Click.

I am a beginner in the process of saving time and learning hard while reading, so I would really appreciate it if you could give me more help.

enter image description here


Solution

  • I would avoid using a browser at all and issue a XMLHTTP POST request and parse the XML response to write out to sheet. Do this in a loop over the gubun codes which cover each tab i.e. gubun=1 to 6.

    Option Explicit
    
    Public Sub GetTable()
        Dim sResponse As String, body As String, columnToWriteOut As Long, gubunNumber As Long
        Dim xmlDoc As Object
    
        Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
        columnToWriteOut = 1
    
        With CreateObject("MSXML2.XMLHTTP")
    
            For gubunNumber = 1 To 6
    
                body = "gubun=" & CStr(gubunNumber)
                .Open "POST", "http://www.kpia.or.kr/index.php/year_sugub/get_year_sugub", False
                .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
                .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36"
                .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
                .setRequestHeader "Content-Length", Len(body)
                .send body
                sResponse = .responseText
    
                With xmlDoc
                    .validateOnParse = True
                    .setProperty "SelectionLanguage", "XPath"
                    .async = False
                    If Not .LoadXML(sResponse) Then
                        Err.Raise .parseError.ErrorCode, , .parseError.reason
                    End If
                End With
    
                Dim startYear As Long, endYear As Long, numColumns As Long, numRows As Long, data()
                Dim node As Object, nextNode As Object, headers(), i As Long
    
                startYear = xmlDoc.SelectSingleNode("//rec/sy").Text
                endYear = xmlDoc.SelectSingleNode("//rec/ey").Text
                numRows = xmlDoc.SelectNodes("//product").Length
    
                ReDim headers(1 To endYear - startYear + 3)
                numColumns = UBound(headers)
                ReDim data(1 To numRows, 1 To numColumns)
                headers(1) = "Product": headers(2) = "Category"
    
                For i = 1 To endYear - startYear + 1
                    headers(i + 2) = startYear + i - 1
                Next
    
                Dim r As Long, c As Long, rowCounter As Long
    
                rowCounter = 0
                For Each node In xmlDoc.SelectNodes("//rec")  ' '//rec/*[not(self::sy) and not(self::ey) and not(self::product)]  ?
                    c = 1: rowCounter = rowCounter + 1
                    For Each nextNode In node.ChildNodes
                        Select Case c
                        Case 3
                            data(rowCounter, 1) = nextNode.Text
                        Case Is > 3
                            data(rowCounter, c - 1) = nextNode.Text
                        End Select
    
                        Select Case rowCounter Mod 4
                        Case 1
                            data(rowCounter, 2) = "Production (shipment)"
                        Case 2
                            data(rowCounter, 2) = "Export"
                        Case 3
                            data(rowCounter, 2) = "income"
                        Case 0
                            data(rowCounter, 2) = "Domestic demand "
                        End Select
                        c = c + 1
                    Next
                Next
    
                With ThisWorkbook.Worksheets("Sheet1")
                    .Cells(1, columnToWriteOut).Resize(1, UBound(headers)) = headers
                    .Cells(2, columnToWriteOut).Resize(UBound(data, 1), UBound(data, 2)) = data
                End With
                columnToWriteOut = columnToWriteOut + UBound(headers) + 2
            Next
        End With
    End Sub
    

    Alternatively you can loop waiting for each Ajax call to complete:

    Option Explicit
    
    Public Sub GetInfo()
        Dim d As WebDriver, ws As Worksheet, clipboard As Object, writeOutColumn As Long
        writeOutColumn = 1
        Const URL = "http://www.kpia.or.kr/index.php/year_sugub"
    
        Set d = New ChromeDriver
        Set ws = ThisWorkbook.Worksheets("Sheet3")
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
    
        With d
            .Start "Chrome"
            .get URL
    
            Dim links As Object, i As Long
            Set links = .FindElementsByCss("[href*=action_tab]")
    
            For i = 1 To links.Count
                If i > 1 Then
                    links(i).Click
                    Do
                    Loop While Not .ExecuteScript("return jQuery.active == 0")
                End If
                Dim table As Object
                Set table = .FindElementByTag("table")
                clipboard.SetText table.Attribute("outerHTML")
                clipboard.PutInClipboard
    
                ws.Cells(1, writeOutColumn).PasteSpecial
                writeOutColumn = writeOutColumn + table.FindElementByTag("tr").FindElementsByTag("td").Count + 2
                Set table = Nothing
            Next
            .Quit
        End With
    End Sub