Search code examples
excelgetelementsbytagnamevba

Loop scraping until number VBA Excel Getelementby


I am scraping this page at work, it is after a log-in but I will try to explain the situation as good as I can

I got a page where I want to get the number that appears after a random amount of time have passed (Depending on the stat of our server, computer etc) It is not scraping until IE.readystat <> 4 but the numbers do still not always appear until a little bit later then that

Sent = ie.document.getElementsByTagName("span")(25).innertext
Sheet4.Range("D" & i).Value = Sent

There are 3 things I get back are:

1 Run Time Error: 91

2 The text "Loading..."

3 12,403 (The number that I am looking for)

I was wondering if there is anyone out there that know if I can loop the scraping until I actually get a number back and then move on to the next task? Without stopping at the "Run time error:91" (Just get excel to ignore that message.


Solution

  • Dynamic JSON (or other) content can really foul up retrieving a web page as both ie.Busy and ie.ReadyState appear to have completed the page delivery but the desired content is yet to arrive. I've been fighting a losing battle with one of these sites on and off for a while now and it doesn't help that the web designers have the professional competency bar set pretty low. Here is what I'm currently using, roughly translated for your purposes.

     Dim url as String
     url = "https://www.mydomain.com"
     ie.Navigate2 url, (navNoHistory + navNoReadFromCache + navNoWriteToCache)  ' &H2 + &H4 + &H8
     Do While ie.Busy Or ie.ReadyState <> READYSTATE_COMPLETE: DoEvents: Loop   ' READYSTATE_COMPLETE is 4
     Do While ie.Document.getElementsByTagName("span").length < 26: DoEvents: Loop
     Sent = ie.document.getElementsByTagName("span")(25).innertext
     Sheet4.Range("D" & i).Value = Sent
    

    You were looking for the 26th <span> element so if the page is never going to send across at least 26 <span>'s then you will be in a forever loop. There are ways to a) eventually give up or b) send an ie.Refresh for another attempt. If you break the loop into separate code lines, you could test against an ultimate time out.

    Testing the .style.visible property of the Loading... text is another method. Unfortunately, I cannot be more specific without seeing at least a portion of that HTML code.