Search code examples
vbafunctionweb-scrapingqueryselector

Trouble reusing the same Http after passing it from a sub to a function


I've written a script using xhr to parse the link of the first post from this website and then passed the link and the http to a function to fetch the title from it's inner page. It's important that I pass both the link and the http to the function in order to reuse the same http that I have used in the first place.

My script appears to be working in the right way but I'm not sure I did the whole thing in the right way. The reason for my confusion is that I get the result when I use like getHTTP(ByVal Http, ByVal link) As Variant. I even get the result when I go for getHTTP(ByVal Http, ByVal link) As String or getHTTP(ByVal Http, ByVal link). Moreover, I didn't explicitly define link as String or Http as XMLHTTP60 within function parameters.

I've tried with (working flawlessly):

Function getHTTP(ByVal Http, ByVal link) As Variant
    Dim Html As New HTMLDocument, oTitle$
    
    With Http
        .Open "GET", link, False
        .send
        Html.body.innerHTML = .responseText
        oTitle = Html.querySelector("h1[itemprop='name'] > a").innerText
        getHTTP = oTitle
    End With
End Function

Sub GetInfo()
    Const base$ = "https://stackoverflow.com"
    Const Url$ = "https://stackoverflow.com/questions/tagged/web-scraping"
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim firstLink$, postTitle$

    With Http
        .Open "GET", Url, False
        .send
        Html.body.innerHTML = .responseText
        firstLink = base & Replace(Html.querySelector(".summary .question-hyperlink").getAttribute("href"), "about:", "")
        postTitle = getHTTP(Http, firstLink)
        MsgBox postTitle
    End With
End Sub

What is the right way to pass http between sub and function in order to reuse the same http?


Solution

  • Something like this might be appropriate:

    Sub GetInfo()
    
        Const base As String = "https://stackoverflow.com"
        Const url As String = "https://stackoverflow.com/questions/tagged/web-scraping"
        Dim Html As HTMLDocument
        Dim firstLink As String, postTitle As String
    
        firstLink = base & Replace(GetPage(url).querySelector(".summary .question-hyperlink") _
                    .getAttribute("href"), "about:", "")
        Debug.Print firstLink
    
        postTitle = GetPage(firstLink).querySelector("h1[itemprop='name'] > a").innerText
        Debug.Print postTitle
    
    End Sub
    
    Function GetPage(url As String) As HTMLDocument
        Dim Html As HTMLDocument
        Static Http As XMLHTTP60
    
        If Http Is Nothing Then
            Set Http = New XMLHTTP60
            'log in here
        End If
    
        With Http
            .Open "GET", url, False
            .send
            If .Status = 200 Then
                Set Html = New HTMLDocument
                Html.body.innerHTML = .responseText
            Else
                Debug.Print .Status
                'warn user
            End If
        End With
        Set GetPage = Html
    End Function
    

    Above all assumes the "happy path" so probably would benefit from some exception handling...