Search code examples
excelvbaweb-scraping

How to get specific Data from Website (HTML)?


I am trying to extract specific data from a website and load it into my Excel worksheet.

For example, I want to extract the Metascore from https://www.metacritic.com/game/sid-meiers-civilization-vi/. This information is located within a specific element.

The element I want to extract
1

Private Sub URL_Load(ByVal sURL As String)
    'Variablen deklarieren
    Dim appInternetExplorer As Object
    Dim htmlTxt As String
    Dim spanTxt2 As String
    
    Set appInternetExplorer = CreateObject("InternetExplorer.Application")
    appInternetExplorer.navigate sURL
    Do: Loop Until appInternetExplorer.Busy = False
    Do: Loop Until appInternetExplorer.Busy = False
    spanTxt = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
    'objSelect = appInternetExplorer.document.DocumentElement.all.tags("SPAN")
    Debug.Print htmlTxt
    Set appInternetExplorer = Nothing
    Close
    'Mache hier irgendwas mit dem Text: Parsen, ausgeben, speichern
    MsgBox "Der Text wurde ausgelesen!"
End Sub

In this code, the variable "spanTxt" is described with the following value: X. Unfortunately, this is not the element I want to extract.

How can I extract a specific element?

I tried:

htmlTxt = appInternetExplorer.document.DocumentElement.outerHTML
htmlTxt1(1) = appInternetExplorer.document.DocumentElement.innerHTML
htmlTxt2 = appInternetExplorer.document.DocumentElement.innerText

Solution

  • Since you're querying a static page, you don't actually have to be familiar with any web technologies or use any external objects:-

    Public Sub ImportWebPage()
        Dim ws As Worksheet: Set ws = Sheet1    'change as required'
        With ws.QueryTables.Add("URL;https://www.metacritic.com/game/sid-meiers-civilization-vi/", ws.Range("A1"))
            .WebSelectionType = xlEntirePage
            .BackgroundQuery = False
            .Refresh
        End With
        Dim metascore As Double
        With ws
            metascore = .Columns(1).Find("meta").Offset(1).Value
            .Columns(1).Clear
            .QueryTables(1).Delete
        End With
        MsgBox "Metascore is " & metascore
    End Sub
    

    Note that this approach imports the full html (so you need a 'spare' range on which to store it) and then uses regular functionality to locate the metascore.