Search code examples
emailencodingmime

Mail subject decoding?


I have a email subject like this:

Subject: =?gbk?Q?=B3=F6=C3=C0=C1=E2=C7=BF=C1=A6=B3=E9=CA=AA=BB=FA=D2=BB=CC=A8?= =?gbk?Q?=A3=AC=D6=E9=BA=A3=B9=E3=D6=DD=C9=FA=BB=EE=B1=D8=B1=B8?=

But I don't know what kind of encoding is this? Could someone help? Newbie to email protocol.


Solution

  • This subject is encoded in GBK, an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.

    As defined in the RFC1342 specification, to represent non-ASCII text in Internet Message headers, you have to encode it with the MIME encoded-word syntax:

    encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="

    charset = token ; legal charsets defined by RFC 1341

    encoding = token ; Either "B" or "Q"

    token = 1*

    tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "/" / "[" / "]" / "?" / "." / "="

    encoded-text = 1* (but see "Use of encoded-words in message ; headers", below)


    The "B" encoding:

    The "B" encoding is identical to the "BASE64" encoding defined by RFC 1341.

    The "Q" encoding:

    The "Q" encoding is similar to the "Quoted-Printable" content-
    transfer-encoding defined in RFC 1341. It is designed to allow text
    containing mostly ASCII characters to be decipherable on an ASCII
    terminal without decoding.

    (1) Any 8-bit value may be represented by a "=" followed by two hexadecimal digits. For example, if the character set in use were ISO-8859-1, the "=" character would thus be encoded as "=3D", and a SPACE by "=20". (Upper case should be used for hexadecimal digits "A" through "F".)

    (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be represented as "" (underscore, ASCII 95.). (This character may not pass through some internetwork mail gateways, but its use will greatly enhance readability of "Q" encoded data with mail readers that do not support this encoding.) Note that the "" always represents hexadecimal 20, even if the SPACE character occupies a different code position in the character set in use.

    (3) 8-bit values which correspond to printable ASCII characters other than "=", "?", and "_" (underscore), MAY be represented as those characters. (But see section 5 for restrictions.) In particular, SPACE and TAB MUST NOT be represented as themselves within encoded words.

    In your subject:

    Subject: =?gbk?Q?=B3=F6=C3=C0=C1=E2=C7=BF=C1=A6=B3=E9=CA=AA=BB=FA=D2=BB=CC=A8?= =?gbk?Q?=A3=AC=D6=E9=BA=A3=B9=E3=D6=DD=C9=FA=BB=EE=B1=D8=B1=B8?=

    We can see that the Quoted-Printable encoding has been used, hence the presence of = as escape character instead of %.

    You can find an online encode here, and an online MIME Headers Decoder here.

    Finally, here is your decoded subject:

    Subject: 出美菱强力抽湿机一台,珠海广州生活必备