I am new scraping web data and also using For...Next. I am trying to get data (all pages) from a website but it seems the code is wrong, since I get error 91. This is the code:
Dim ie As Object
Sub connect()
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE "https://www.worldathletics.org/world-rankings/100m/men"
ie.Visible = True
End Sub
Sub id_tr_td_for()
Range("a1:z10000").ClearContents
For i = 0 To 10
For j = 0 To 5
Cells(i + 1, j + 1) = ie.document.getElementById("toplists").getElementsByTagName("tr")(i).getElementsByTagName("td")(j).innerText
Next
Next
End Sub
Can somebody help me with it and also to let me know who can I list all pages?
Thank you.
I'm not sure where the error comes from, I got it too.
The following code should be helpful, it will print the contents of the table for the specified page(s) to the debug window.
The following code should copy all the data for selected pages to sheet1
You will need to Add a couple of references in the VBA Editor to be able to use it. (Tools Menu, References and then find and select them) Microsoft HTML Object Library
and Microsoft Internet Controls
Const MaxPage = 2 ' set to 26 (or however many there are) - at 2 for testing purposes
Dim Browser As InternetExplorer
Sub Start()
Dim Page As Integer: Page = 1 ' start at page 1
Dim PageDocument As IHTMLDocument
Dim RecordRow As IHTMLElementCollection
Dim RecordItem As IHTMLElement
Dim Sheet As Worksheet: Set Sheet = ThisWorkbook.Worksheets("Sheet1") ' output sheet
If Browser Is Nothing Then
Set Browser = New InternetExplorer
End If
Dim oRow As Integer: oRow = 2 ' begin output at row 2 (account for header)
Dim Record As Integer
For Page = 1 To MaxPage
LoadPage Page
For Record = 0 To 99 ' zero index, 100 items (1-99)
Set PageDocument = Browser.Document
Set RecordRow = PageDocument.getElementById("toplists").getElementsByTagName("table")(0).getElementsByTagName("tbody")(0).getElementsByTagName("tr")(Record).getElementsByTagName("td")
Sheet.Cells(oRow, 1).Value = Trim(RecordRow(0).innerText)
Sheet.Cells(oRow, 2).Value = Trim(RecordRow(1).innerText)
Sheet.Cells(oRow, 3).Value = Trim(RecordRow(2).innerText)
Sheet.Cells(oRow, 4).Value = Trim(RecordRow(3).innerText)
Sheet.Cells(oRow, 5).Value = Trim(RecordRow(4).innerText)
Sheet.Cells(oRow, 6).Value = Trim(RecordRow(5).innerText)
oRow = oRow + 1
Next Record
Next Page
Browser.Quit
End Sub
Sub LoadPage(ByVal PageNumber As Integer)
Debug.Print "Navigating to Page #" & CStr(PageNumber)
Browser.navigate "https://www.worldathletics.org/world-rankings/100m/men?page=" & CStr(PageNumber)
While Browser.readyState <> 4 Or Browser.Busy: DoEvents: Wend
Debug.Print "Navigation Complete"
End Sub
Updated Code
The Index Out-of-Bound error likely occurred due to the hard-coded indexes, if a page does not have 99 records it will fail, if a record doesn't have 5 fields, it will fail. The following code does away with indexes and just scrapes every row and cell it finds. You shouldn't get index errors but the output could be jagged.
Further Update
The 462 error was caused by the Browser.Quit
. This closes the browser but does not set the reference to Nothing
so when you run the code again it is trying to use a non-existent browser. Explicitly setting it to nothing at the end fixes this.
There is no link in the competitor column, the whole row has a data-url which is handled by something else. That URL can easily be accessed though.
Sub NewStart()
Dim PageDocument As IHTMLDocument
Dim Records As IHTMLElementCollection
Dim Record As IHTMLElement
Dim RecordItems As IHTMLElementCollection
Dim RecordItem As IHTMLElement
Dim OutputRow As Integer: OutputRow = 2
Dim OutputColumn As Integer
Dim Page As Integer
Dim Sheet As Worksheet: Set Sheet = ThisWorkbook.Worksheets("Sheet1")
If Browser Is Nothing Then
Set Browser = New InternetExplorer
Browser.Visible = True
End If
For Page = 1 To MaxPage
LoadPage Page
Set PageDocument = Browser.Document
Set Records = PageDocument.getElementById("toplists").getElementsByTagName("table")(0).getElementsByTagName("tbody")(0).getElementsByTagName("tr")
For Each Record In Records
Set RecordItems = Record.getElementsByTagName("td")
OutputColumn = 1
For Each RecordItem In RecordItems
Sheet.Cells(OutputRow, OutputColumn).Value = Trim(RecordItem.innerText)
OutputColumn = OutputColumn + 1
Next RecordItem
Sheet.Cells(OutputRow, OutputColumn).Value = "http://worldathletics.org/" & Record.getAttribute("data-athlete-url") ' This will add the link after the last column
OutputRow = OutputRow + 1
Next Record
Next Page
Browser.Quit
Set Browser = Nothing ' This will fix the 462 error
End Sub