I've written a script in vba to scrape the first post from a website after making a proxied request. I've used proxy (out of list of proxies) while making http request within my vba script in order to check out the length of total posts. When a request is successfully sent, the script should parse the first post and the proxy being used and exit the loop.
Sometimes the script work in the right way but most of the times the script takes ages to complete the operation even when I've defined
timeout
before sending request. At this point I'm highly dubious as to whether I could fill in thetimeout
parameter in the right way. What I expect is that the script will wait upto that time for any response, othrwise it will throwtimeout
error and go for the next request.
I've written so far:
Sub HandleTimeOut()
Dim Http As New ServerXMLHTTP60, Html As New HTMLDocument
Dim elem As Object, proxyList As Variant, oProxy As Variant
proxyList = [{"50.246.120.125:8080","198.204.253.115:3128","98.172.142.99:8080","207.188.231.141:8080"}]
For Each oProxy In proxyList
With Http
.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", True
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setProxy 2, oProxy
.setTimeouts 600000, 600000, 15000, 15000
On Error Resume Next
.send
While .readyState < 4: DoEvents: Wend
Html.body.innerHTML = .responseText
Set elem = Html.querySelectorAll(".summary .question-hyperlink")
On Error GoTo 0
End With
If elem.Length > 0 Then
[A1] = oProxy
[B1] = elem(0).innerText
Exit For
End If
Next oProxy
End Sub
What is the right way to set timeout
for five seconds?
.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", True
should be
.Open "GET", "https://stackoverflow.com/questions/tagged/web-scraping", False