Search code examples
vb.netutf-8asciipop3quoted-printable

VB.NET Convert Unicode 8 (UTF8) into Regular American ASCII


I have thing problem here is the debugging outputs

"?uƒn74tn5187r&key=6e6e0936c4e6c48be56a72eba8964df0"

should be

"?u=83n74tn5187r&key=6e6e0936c4e6c48be56a72eba8964df0"

I have tried solution from another similar question and it failed me.

Dim uni As Byte() = Encoding.GetEncoding(437).GetBytes("?uƒn74tn5187r&key=6e6e0936c4e6c48be56a72eba8964df0")
Dim Ascii As String = Encoding.ASCII.GetString(uni)

Ascii = "?u?n74tn5187r&key=6e6e0936c4e6c48be56a72eba8964df0"

I'm guessing I have to guess the 437.. maybe a brute force attack on all numbers until the match of ?u=83 from ?uƒ

Really I am trying to read a Unicode-32 (Brasil formatted text from email (POP3). Now that I think about it =83 could be messed up using this function here.

But without this function, the body of the POP3 email will contain maybe useless like variant of urlencode() but.. instead of %20 it uses =20.

I wonder how to fix this.

 Public Shared Function DecodeQuotedPrintable(ByVal Message As String, Optional ByVal QuickClean As Boolean = False) As String
        'set up StringBuilder object with data stripped of any line continuation tags
        Dim Msg As New StringBuilder(Message.Replace("=" & vbCrLf, vbNullString))

        If QuickClean Then                                                  'perform a quick clean (clean up common basics)
            Return Msg.Replace("=" & vbCrLf, vbNullString).Replace("=0D", vbCr).Replace("=0A", _
                                   vbLf).Replace("=20", " ").Replace("=3D", "=").ToString
        Else                                                                'perform total cleaning
            'store 2-character hex values that require a leading "0"
            Dim HxData As String = "X0102030405060708090A0B0C0D0E0F"
            For Idx As Integer = 1 To &HF                                   'initially process codes 1-15, which require a leading zero
                Msg.Replace("=" & Mid(HxData, Idx << 1, 2), Chr(Idx))       'replace hex data with single character code (SHIFT is faster)
            Next
            For idx As Integer = &H10 To &HFF                               'process the whole 8-bit extended ASCII gambit
                Msg.Replace("=" & Hex(idx), Chr(idx))                       'replace hex data with single character code
            Next
            Return Msg.ToString                                             'return result string
        End If
    End Function

Edit: My attempt at fixing the function (if it really causes the problem? I'll never know)

Public Shared Function DecodeQuotedPrintable(ByVal Message As String, Optional ByVal QuickClean As Boolean = False) As String
    'set up StringBuilder object with data stripped of any line continuation tags
    Dim Msg As New StringBuilder(Message.Replace("=" & vbCrLf, vbNullString))

    If QuickClean Then                                                  'perform a quick clean (clean up common basics)
        Return Msg.Replace("=" & vbCrLf, vbNullString).Replace("=0D", vbCr).Replace("=0A",
                           vbLf).Replace("=20", " ").Replace("=3D", "=").ToString
    Else                                                                'perform total cleaning
        'store 2-character hex values that require a leading "0"

        Msg.Replace("=" & vbCrLf, vbNullString).Replace("=0D", vbCr).Replace("=0A",
                           vbLf).Replace("=20", " ").Replace("=3D", "%$#@[EQUALS]@#$%").ToString()

        Dim HxData As String = "X0102030405060708090A0B0C0D0E0F"
        For Idx As Integer = 1 To &HF                                   'initially process codes 1-15, which require a leading zero
            Msg.Replace("=" & Mid(HxData, Idx << 1, 2), Chr(Idx))       'replace hex data with single character code (SHIFT is faster)
        Next
        For idx As Integer = &H10 To &HFF                               'process the whole 8-bit extended ASCII gambit
            Msg.Replace("=" & Hex(idx), Chr(idx))                       'replace hex data with single character code
        Next

        Msg.Replace("%$#@[EQUALS]@#$%", "=")

        Return Msg.ToString                                             'return result string
    End If
End Function

Solution

  • "ƒ" is represented by =83 in Quoted Printable encoding in the Windows-1252 character set.