Search code examples
utf-8vbscriptserverxmlhttppunycode

MSXML2.ServerXMLHTTP and national characters


This question is related to this one: Character encoding Microsoft.XmlHttp in Vbscript, but differs in one thing, the national characters are in the domain name, not only arguments.

The task is: download a page from the given URL.

I already solved problem of passing UTF8 string into VBScript by reading it from UTF8 encoded file through ADO.

But now when I try opening it MSXML2.ServerXMLHTTP returns error: The URL is invalid.

Here is VBScript code:

Set objStream = CreateObject("ADODB.Stream")
objStream.CharSet = "utf-8"
objStream.Open
objStream.LoadFromFile("fileWithURL.txt")
url = objStream.ReadText()
objStream.Close

Set XMLHttpReq = CreateObject("MSXML2.ServerXMLHTTP")
XMLHttpReq.Open "GET", url, False
XMLHttpReq.send
WEBPAGE = XMLHttpReq.responseText

If you put something like hxxp://россия.рф/main/page5.html into the UTF8 encoded fileWithURL.txt the script will raise an error while working ok with hxxp://google.com.

The workaround is to use ascii representation of the domain name - but I yet haven't found PunnyCode encoder for vbscript (apart from Chillkat which is an overkill for my task).

Will appreciate your help on the main problem or workaround.


Solution

  • I've made an amazing journey in to depth of my hard drive and found a code writen by / for Jesper Høy. This was the source code of SimpleDNS Plus' IDN Conversion Tool at that time.

    Archive.org page snapshot: http://www.simpledns.com/idn-convert.asp
    Archive.org file snapshot: idn-convert-asp.zip

    You can also copy the whole code from this gist.

    Create a function to convert URLs.

    Function DummyPuny(ByVal url)
        Dim rSegments : rSegments = Split(url, "/")
    
        If UBound(rSegments) > 1 Then
            rSegments(2) = DomainPunyEncode(rSegments(2))
        End If
    
        DummyPuny = Join(rSegments, "/")
    End Function
    

    Then convert your url before making the request.

    XMLHttpReq.Open "GET", DummyPuny(url), False