Search code examples
c#imapdecode

Decode cyrillic quoted-printable content


I'm using this sample for getting mail from server. Problem is that response contains cyrillic symbols I cannot decode. Here is a header:

Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

And receive response function:

static void receiveResponse(string command)
{
    try
    {
        if (command != "")
        {
            if (tcpc.Connected)
            {
                dummy = Encoding.ASCII.GetBytes(command);
                ssl.Write(dummy, 0, dummy.Length);
            }
            else
            {
                throw new ApplicationException("TCP CONNECTION DISCONNECTED");
            }
        }
        ssl.Flush();

        byte[] bigBuffer = new byte[1024*16];
        int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);

        byte[] buffer = new byte[bites];
        Array.Copy(bigBuffer, 0, buffer, 0, bites);

        sb.Append(Encoding.ASCII.GetString(buffer));

        string result = sb.ToString();

        // here is an unsuccessful attempt at decoding
        result = Regex.Replace(result, @"=([0-9a-fA-F]{2})",
            m => m.Groups[1].Success
            ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
            : "");

        byte[] bytes = Encoding.Default.GetBytes(result);
        result = Encoding.GetEncoding("koi8r").GetString(bytes);
    }
    catch (Exception ex)
    {
        throw new ApplicationException(ex.ToString());
    }
}

How to decode stream correctly? In result string I got <p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p> instead of <p>Привет я Ваня</p>.


Solution

  • As @Max pointed out, you will need to decode the content using the encoding algorithm declared in the Content-Transfer-Encoding header.

    In your case, it is the quoted-printable encoding.

    You will need to decode the text of the message into an array of bytes and then you’ll need to convert that array of bytes into a string using the appropriate System.Text.Encoding. The name of the encoding to use will typically be specified in the Content-Type header as the charset parameter (in your case, koi8-r).

    Since you already have the text as bytes in the buffer variable, simply perform the deciding on that:

    byte[] buffer = new byte[bites];
    int decodedLength = 0;
    
    for (int i = 0; i < bites; i++) {
        if (bigBuffer[i] == (byte) '=') {
            if (bites > i + 1) {
                // possible hex sequence
                byte b1 = bigBuffer[i + 1];
                byte b2 = bigBuffer[i + 2];
    
                if (IsXDigit (b1) && IsXDigit (b2)) {
                    // decode
                    buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
                    i += 2;
                } else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
                    // folded line, drop the '=\r\n' sequence
                    i += 2;
                } else {
                    // error condition, just pass it through
                    buffer[decodedLength++] = bigBuffer[i];
                }
            } else {
                // truncated? just pass it through
                buffer[decodedLength++] = bigBuffer[i];
            }
        } else {
            buffer[decodedLength++] = bigBuffer[i];
        }
    }
    
    string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);
    

    Custom functions:

    static byte ToXDigit (byte c)
    {
        if (c >= 0x41) {
            if (c >= 0x61)
                return (byte) (c - (0x61 - 0x0a));
    
            return (byte) (c - (0x41 - 0x0A));
        }
    
        return (byte) (c - 0x30);
    }
    
    static bool IsXDigit (byte c)
    {
        return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
    }
    

    Of course, instead of writing your own hodge podge IMAP library, you could just use MimeKit and MailKit ;-)