I would like to extract data from betexplorer.com. I want to extract two different pieces of data from the following URL:
https://www.betexplorer.com/soccer/s...eague-1/stats/
I would like to extract Matches Played and Matches Remaining I would like to extract Home Goals and Away Goals (per match)
I have the the code to do that and it is as follows:
Option Explicit
Sub GetSoccerStats()
'Set a reference (VBE > Tools > References) to the following libraries:
' 1) Microsoft XML, v6.0
' 2) Microsoft HTML Object Library
Dim xmlReq As New MSXML2.XMLHTTP60
Dim objDoc As New MSHTML.HTMLDocument
Dim objTable As MSHTML.htmlTable
Dim objTableRow As MSHTML.htmlTableRow
Dim strURL As String
Dim strResp As String
Dim strText As String
Dim rw As Long
strURL = "https://www.betexplorer.com/soccer/south-korea/k-league-1/stats/"
With xmlReq
.Open "GET", strURL, False
.send
If .Status <> 200 Then
MsgBox "Error " & .Status & ": " & .statusText
Exit Sub
End If
strResp = .responseText
End With
Worksheets.Add
objDoc.body.innerHTML = strResp
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
rw = 1
For Each objTableRow In objTable.Rows
strText = objTableRow.Cells(0).innerText
Select Case strText
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
Cells(rw, "a").Value = objTableRow.Cells(0).innerText
Cells(rw, "b").Value = objTableRow.Cells(1).innerText
Cells(rw, "c").Value = objTableRow.Cells(2).innerText
rw = rw + 1
End Select
Next objTableRow
Columns("a").AutoFit
End If
Set xmlReq = Nothing
Set objDoc = Nothing
Set objTable = Nothing
Set objTableRow = Nothing
End Sub
This code works however i want to take it a step further.
I actually want to run this macro for many different URL's on the same site. I have a worksheet already created that has a list of Football Leagues (in the rows), the columns hold the data.
You can find the file here : https://www.dropbox.com/s/77sol24sty75w5z/Avg%20Goals.xlsm?dl=0
This is a file where i will add leagues to the rows as i go. Is it possible to adapt the code that extracts the data so that it can populate the columns in my sheet? I do not need to pull in the names of the data (matches remaining, home goals, away goals etc) as this code does, i only need the figures. The extracted figures would have to populate the columns as per the sheet (so each row contains the data for each league. As you can see there are a few leagues so it would need to loop through each row and then use the corresponding URL for that row.
You will notice that there is a column that contains the word CURRENT. This is to indicate that it should use the URL in the Current URL column. If I change the value to LAST i would like it to use the URL in the Last URL column.
For each league it will be different if I use CURRENT or LAST.
Here is a picture of expected output:
Any help is greatly appreciated.
Keeping in line with your code this will output the data for those items in columns M:T. I have a helper function, GetLinks
, which generates an array of final urls to used based on the value in column K:
inputArray = GetLinks(inputArray)
This array is looped and xhr requests are issued for the information. All the results information is stored in an array, results
, which is written out in one go to the sheet at the end.
I work with array throughout as you don't want to keep reading from the sheet; that is an expensive operation which slows your code. For the same reason, if <> 200 occurs, I print to the immediate window the message and the url so as to not slow the code. You effectively have a log then you can review at the end.
The retrieved results are written out from column M, but as the data is in array, you can easily write out to where ever you want; simply change the start cell for pasting from M4
to which ever top leftmost cell you want. Your existing columns do not have percentages in, so I felt safe to assume you expected the written out data to be in new columns (possibly even in a different sheet).
Option Explicit
Public Sub GetSoccerStats()
Dim xmlReq As New MSXML2.XMLHTTP60, response As String
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("J4:L" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With xmlReq
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.Open "GET", inputArray(i, 4), False
.send
If .Status <> 200 Then
Debug.Print inputArray(i, 4), vbTab, "Error " & .Status & ": " & .statusText
Else
response = .responseText
objDoc.body.innerHTML = response
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
End If
Set objTable = Nothing
Next
End With
dataSheet.Range("M4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
Public Function GetLinks(ByRef inputArray As Variant) As Variant
Dim i As Long
ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), inputArray(i, 3))
Next
GetLinks = inputArray
End Function
Layout of file:
Given large number of requests led to blocking here is IE version:
'VBE > Tools > References:
'1: Microsoft HTML Object library 2: Microsoft Internet Controls
Public Sub GetSoccerStats()
Dim ie As Object, t As Date
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Const MAX_WAIT_SEC As Long = 10
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
Set ie = CreateObject("InternetExplorer.Application")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("C4:E" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With ie
.Visible = True
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.navigate2 inputArray(i, 4)
While .Busy Or .readyState < 4: DoEvents: Wend
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
t = timer
Do
DoEvents
On Error Resume Next
Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While objTable Is Nothing
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
Set objTable = Nothing
Next
.Quit
End With
dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub