I was looking at the answer, by @OmegaStripes, to this question How to get a particular InnerText from a specific class? Here one uses the Split
function, and a specified delimiter string, to extract an href
from .responseBody
.
I then tried to replicate this to extract the following href
:
"https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv"
from NHS England's Ambulance Quality Indicators
HTML snippet:
<main class="main group" role="main">
<div class="page-content" id="main-content">
<header>
<h1>Ambulance Quality Indicators</h1>
</header>
<article class="rich-text">
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p><strong>CSV Data</strong><br>
These files have the same data as other published spreadsheets, but without any formatting:<br>
<a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv" class="csv-link" onclick="ga('send', 'event', 'Downloads', 'CSV', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/02/New-AmbSYS-to-2018-Jan.csv');">New Systems Indicators August 2017 to January 2018 (CSV, 23KB)</a><br>
</article>
</div>
</main>
Problem:
I am getting response text back that looks as follows:
Example response text:
From a quick bit of research, see references, I am guessing this is perhaps an encoding problem?
I tried setting a .SetRequestHeader
.setRequestHeader "Content-Type", _
"application/x-www-form-urlencoded; charset=UTF-8"
This made no difference to the output.
To be honest, I haven't a clue how to resolve this.
Any suggestions please on how I get the expected response text? i.e. that I can parse for the href
of interest.
Context:
This is part of a bigger piece of work where:
I want to scrape that CSV link (the name of which will change each month), without having the browser pop-up
Download the target file content
Use ADODB.Stream to write the binary file out.
This process was outlined by @OmegaStripes in response to my question Return focus to ThisWorkbook.Activesheet after XMLHTTP60 file download . I am trying to understand and implement that suggestion currently.
Code:
Option Explicit
Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String
Sub Testing()
' Download via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
.send
' Get binary response content
aBody = .responseBody
End With
ActiveSheet.Range("A1") = aBody
End Sub
References:
So credit goes to @FlorentB for this solution and a shout out to @OmegaStripes for the suggestion.
As suggested, the problem indeed was the .responseBody
was returning an array of bytes encoded as UTF-8. As pointed out, I was casting it to a String (UTF-16 encoding) hence all these foreign characters.
I used @Tomalak's function BytesToString
, with minor changes, to handle the conversion to string.
Code:
Option Explicit
Public Const url As String = "https://www.england.nhs.uk/statistics/statistical-work-areas/ambulance-quality-indicators/"
Public aBody As String 'this is causing the conversion
Const adTypeBinary As Byte = 1
Const adTypeText As Byte = 2
Const adModeReadWrite As Byte = 3
Public Const strPath As String = "C:\Users\User\Desktop\testXMLHTTPOutput"
Public Sub Testing()
' Download via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
' Get binary response content
aBody = BytesToString(.responseBody, "UTF-8")
End With
Dim fso As Object 'late binding
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.CreateTextFile(strPath)
oFile.WriteLine aBody
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
'ADODB.Stream with stream.CharSet = "UTF-8"
'http://msdn.microsoft.com/en-us/library/windows/desktop/ms675032%28v=vs.85%29.aspx
Public Function BytesToString(ByVal bytes As Variant, ByVal charset As String) As String
With CreateObject("ADODB.Stream")
.Mode = adModeReadWrite
.Type = adTypeBinary
.Open
.Write bytes
.Position = 0
.Type = adTypeText
.charset = charset
BytesToString = .ReadText
End With
End Function
Useful other link here: