Search code examples
pythonunicodedecodeemail-headers

Mail Subject decoding


Why does decoding the following fail

Subject: =?ISO-8859-1?Q?Begr=FC=DFungsschreibe?=n

while decoding the following works?

Subject: =?ISO-8859-1?Q?Begr=FC=DFungsschreiben?=

I read https://www.rfc-editor.org/rfc/rfc2047 but could not find a note that there must be space after the end mark ?=. That's the only difference between the two lines. In the first line (which is the failed one) there is the n directly after the end mark.

I used Python 2.7 for decoding.

I googled for a bug in Python, but could only found resolved issues.


Solution

  • RFC2047 5.1:

    an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'.

    The Subject header is defined as *text (and in any case the errata add the same restriction more generally to section 2).