While the RFC 2045 clearly states that a line in quoted-printable (QP) must not be longer than 76 characters, in the real-world not every client seems to follow this requirement. Or could it be I misunderstand the requirement from the RFC?
Consider the following few lines from a real-world mail message:
<style type=3D"text/css">=0Abody,td { color:#2f2f2f; font:11px/1.35em Verdana, Arial, Helvetica, sans-serif; }=0A</style>=0A<body style=3D"background:#F6F6F6; font-family:Verdana, Arial, Helvetica, sa=
ns-serif; font-size:12px; margin:0; padding:0;">=0D=0A<div style=3D"background:#F6F6F6; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; margin:0; padding:0;">=0D=0A<table cellspacin=
g=3D"0" cellpadding=3D"0" border=3D"0" width=3D"100%">=0D=0A<tr>=0D=0A <td align=3D"center" valign=3D"top" style=3D"padding:20px 0 20px 0">=0D=0A <!-- [ header starts here] -->=0D=0A =
Each line is 201 characters plus the CRLF. However, there are several =0A
sequences which translate to LF. So does that mean I need to be able to parse this message or can I reject it?
It seems to me that it violates the following statement from the RFC, but I am not 100% certain:
(5) (Soft Line Breaks) The Quoted-Printable encoding
REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
with the Quoted-Printable encoding, "soft" line breaks
must be used. An equal sign as the last character on a
encoded line indicates such a non-significant ("soft")
line break in the encoded text.
You should be able to parse this message although the longest line contains 128 symbols.
There are =0A
and =SPACE
sequences in this message.
=0A
is a meaningful line break and =SPACE
is a soft line break.
Hard line breaks should be CRLF (=0D=0A
) but the linked RFC 2045
permits only LF too (without CR):
(4) (Line Breaks) A line break in a text body, represented
as a CRLF sequence in the text canonical form, must be
represented by a (RFC 822) line break, which is also a
CRLF sequence, in the Quoted-Printable encoding. (...)
Note that many implementations may elect to encode the
local representation of various content types directly
rather than converting to canonical form first,
encoding, and then converting back to local
representation. In particular, this may apply to plain
text material on systems that use newline conventions
other than a CRLF terminator sequence. Such an
implementation optimization is permissible, but only
when the combined canonicalization-encoding step is
equivalent to performing the three steps separately.