According to Spolsky I can't call myself a developer, so there is a lot of shame behind this question...
Scenario: From a C# application, I would like to take a string value from a SQL db and use it as the name of a directory. I have a secure (SSL) FTP server on which I want to set the current directory using the string value from the DB.
Problem: Everything is working fine until I hit a string value with a "special" character - I seem unable to encode the directory name correctly to satisfy the FTP server.
The code example below
Process _winscp = new Process();
byte[] buffer;
string nameFromString = "Sinéad O'Connor";
_winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\"");
buffer = Encoding.UTF8.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\"");
buffer = Encoding.ASCII.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\"");
byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 };
_winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\"");
The UTF8 encoding changes é to 101 (decimal) but the FTP server doesn't like it.
The ASCII encoding changes é to 63 (decimal) but the FTP server doesn't like it.
When I represent é as value 130 (decimal) the FTP server is happy, except I can't find a method that will do this for me (I had to manually contruct the string from explicit bytes).
Anyone know what I should do to my string to encode the é as 130 and make the FTP server happy and finally elevate me to level 1 developer by explaining the only single thing a developer should understand?
130 isn't ASCII (ASCII is only 7bits -- see the Encoding.ASCII documentation -- so it whacks the "é" into a normal "?" because it has nothing better to do). UTF-8 is actually encoding the character into two bytes (decimal: 195 & 169) but preserves the code-point.
Use a code-page explicitly, such as Latin (CP 1252) -- needs to match whatever other side is. As from below, there is no "130" in the output so... not the encoding you need :-) But the same applies: use an encoding for a specific code-page.
Edit: As Hans Passant explained in a comment, the code-page to use here is MS-DOS (CP 437) which will result in the desired results.
// LINQPad -- Encoding is System.Text.Encoding
var enc = Encoding.GetEncoding(1252);
string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump();
// -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114
See: http://msdn.microsoft.com/en-us/goglobal/bb688114 for more.
Happy coding.
Btw. good selection in artists -- if it was intentional :p