Search code examples
excelvbaweb-scrapingxmlhttprequestconcatenation

Concatenate referenced URL into XML HTTP Request


The following snippet of code sends a XML request to the following site

Sub GetContents()
   
            Dim XMLReq As New MSXML2.XMLHTTP60
            
            XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.028.723", False
            XMLReq.send

End Sub

I have another Sub routine GetURL() which prints out the desired URL in this case: https://echa.europa.eu/brief-profile/-/briefprofile/100.028.723

How can I essentially concatenate the output of GetURL() into the BstrUrl? i.e.

XMLReq.Open "Get", "x", False where x is the output of GetURL()

Despite various attempts the syntax is not accepted as a URL.


Solution

  • Assuming you are combining from your earlier question then you need to ensure you write a function which returns the url (as Tim Williams has pointed out). I would expand upon this, in that I think you would need to consider adding a test to ensure both the request succeeded, there were results, and to pass the searchKeyWord as an argument to make your function more reusable. Along the same lines, you could pass the xmlhttp object into the function, so as to avoid continually creating and destroying them.

    Avoid auto-instantiation, as you can get unexpected results, and Hungarian style notation. Personally, I also avoid those type characters, as they are harder to read.

    vbNullString will offer faster assignment than = "".

    I would also use a shorter, faster, and more reliable css pattern to retrieve the url, based on classes and a parent child relationship of two elements.


    Public Sub GetContents()
        Dim searchKeyWord As String, xmlReq As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, url As String
        
        searchKeyWord = "Acetone"
        Set xmlReq = New MSXML2.XMLHTTP60
        
        url = GetUrl(searchKeyWord, xmlReq)
        
        Set html = New MSHTML.HTMLDocument
        
        If url <> "N/A" Then
        
            With xmlReq
                .Open "GET", url, False
                .send
                If .Status = 200 Then
                    html.body.innerHTML = .responseText
                    Debug.Print html.querySelector("title").innerText
                End If
            End With
           
        End If
        
    End Sub
    
    
    Public Function GetUrl(ByVal searchKeyWord As String, ByVal http As MSXML2.XMLHTTP60) As String
     
        Const url = "https://echa.europa.eu/search-for-chemicals?p_auth=5ayUnMyz&p_p_id=disssimplesearch_WAR_disssearchportlet&p_p_lifecycle=1&p_p_state=normal&p_p_col_id=_118_INSTANCE_UFgbrDo05Elj__column-1&p_p_col_count=1&_disssimplesearch_WAR_disssearchportlet_javax.portlet.action=doSearchAction&_disssimplesearch_WAR_disssearchportlet_backURL=https%3A%2F%2Fecha.europa.eu%2Finformation-on-chemicals%3Fp_p_id%3Ddisssimplesearchhomepage_WAR_disssearchportlet%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview%26p_p_col_id%3D_118_INSTANCE_UFgbrDo05Elj__column-1%26p_p_col_count%3D1%26_disssimplesearchhomepage_WAR_disssearchportlet_sessionCriteriaId%3D"
        Dim html As MSHTML.HTMLDocument, dict As Object, i As Long, r As Long
        Dim dictKey As Variant, payload$, ws As Worksheet
        
        Set html = New MSHTML.HTMLDocument
        Set dict = CreateObject("Scripting.Dictionary")
        Set ws = ThisWorkbook.Worksheets("Sheet1")
    
        dict("_disssimplesearchhomepage_WAR_disssearchportlet_formDate") = "1621017052777" 'timestamp
        dict("_disssimplesearch_WAR_disssearchportlet_searchOccurred") = "true"
        dict("_disssimplesearch_WAR_disssearchportlet_sskeywordKey") = searchKeyWord
        dict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimer") = "true"
        dict("_disssimplesearchhomepage_WAR_disssearchportlet_disclaimerCheckbox") = "on"
    
        payload = vbNullString
        
        For Each dictKey In dict
            payload = IIf(Len(dictKey) = 0, WorksheetFunction.EncodeURL(dictKey) & "=" & WorksheetFunction.EncodeURL(dict(dictKey)), _
                          payload & "&" & WorksheetFunction.EncodeURL(dictKey) & "=" & WorksheetFunction.EncodeURL(dict(dictKey)))
        Next dictKey
        
        With http
            .Open "POST", url, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
            .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
            .send (payload)
            If .Status = 200 Then
                html.body.innerHTML = .responseText
            Else
                GetUrl = "N/A"
                Exit Function
            End If
        End With
        
        Dim result As Boolean
        
        result = html.querySelectorAll(".lfr-search-container  .substanceNameLink").Length > 0
        
        GetUrl = IIf(result, html.querySelector(".lfr-search-container  .substanceNameLink").href, "N/A")
    End Function