I've been messing around with ASCII in Visual Basic (on .NET) for a school project, and recently discovered something that I don't know how to get around. Converting an integer to ASCII using Asc()
and then copying it to another script and converting to an integer using Chr()
will return the wrong characters in some cases. For example.
Dim i As String = Chr(29)
Console.WriteLine(i) ' Returns ↔
Dim i2 As Integer = Asc("↔")
Console.WriteLine(i2) ' Returns 63
Currently, I think this could be due to a couple things:
Asc()
function may resort to other character encoding forms where ↔ is 63?Could someone explain to me why this occurs and/or show me a way of preventing this problem from happening? This seems like a pretty annoying issue for people who use VB and character encoding if it is a genuine problem, however I would be surprised by this as I haven't been using character encoding for long and have probably missed something.
Thanks in advance!
Based on a little investigation, it seems like this is the result of a perhaps surprising interaction of the behavior of the IDE text editor with the Asc
function. As another answer noted, the text editor supports the full Unicode character set (including in identifiers e.g. ಠ_ಠ
is a legal identifier in .NET). When I used a character version of the Asc
test (i.e. Asc("↔"c)
) and checked on the disassembly, I found that it was processed as 0x2194, i.e. Unicode code point U+2194 Left Right Arrow.
This suggests that the IDE text editor is making a conversion of the ASCII into the best equivalent Unicode code point. Calling Asc
on the result is returning 63 which corresponds to a question mark, which I believe is the correct result for trying to represent a Unicode code point when there isn't a valid conversion for it. The correctness of this behavior is debatable, but I'm not sure we should expect too much of Asc
as I would view it as more of a compatibility routine anyway---it appears that it isn't designed to try to back-convert a Unicode code point that lies beyond U+FF (or possibly U+7F) and will instead just treat it as ?
.
Note that if I call AscW
instead of Asc
, I get 8596 (or &H2194) which is what I would expect based on the previous investigation of the disassembly.
As a practical matter, this doesn't seem like it should be a problem. If you really care about the exact character code, then you should use Chr
or ChrW
to generate the character instead of relying on the IDE rendering your typed character exactly.