Search code examples
language-lawyerrfcquoted-printable

In quoted-printable, what constitutes a line break according to the 76 character rule?


While the RFC 2045 clearly states that a line in quoted-printable (QP) must not be longer than 76 characters, in the real-world not every client seems to follow this requirement. Or could it be I misunderstand the requirement from the RFC?

Consider the following few lines from a real-world mail message:

<style type=3D"text/css">=0Abody,td { color:#2f2f2f; font:11px/1.35em Verdana, Arial, Helvetica, sans-serif; }=0A</style>=0A<body style=3D"background:#F6F6F6; font-family:Verdana, Arial, Helvetica, sa=
ns-serif; font-size:12px; margin:0; padding:0;">=0D=0A<div style=3D"background:#F6F6F6; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; margin:0; padding:0;">=0D=0A<table cellspacin=
g=3D"0" cellpadding=3D"0" border=3D"0" width=3D"100%">=0D=0A<tr>=0D=0A    <td align=3D"center" valign=3D"top" style=3D"padding:20px 0 20px 0">=0D=0A        <!-- [ header starts here] -->=0D=0A       =

Each line is 201 characters plus the CRLF. However, there are several =0A sequences which translate to LF. So does that mean I need to be able to parse this message or can I reject it?

It seems to me that it violates the following statement from the RFC, but I am not 100% certain:

(5)   (Soft Line Breaks) The Quoted-Printable encoding
      REQUIRES that encoded lines be no more than 76
      characters long.  If longer lines are to be encoded
      with the Quoted-Printable encoding, "soft" line breaks
      must be used.  An equal sign as the last character on a
      encoded line indicates such a non-significant ("soft")
      line break in the encoded text.

Solution

  • You should be able to parse this message although the longest line contains 128 symbols.

    There are =0A and =SPACE sequences in this message.
    =0A is a meaningful line break and =SPACE is a soft line break.
    Hard line breaks should be CRLF (=0D=0A) but the linked RFC 2045
    permits only LF too (without CR):

    (4)   (Line Breaks) A line break in a text body, represented
          as a CRLF sequence in the text canonical form, must be
          represented by a (RFC 822) line break, which is also a
          CRLF sequence, in the Quoted-Printable encoding.  (...)
    
          Note that many implementations may elect to encode the
          local representation of various content types directly
          rather than converting to canonical form first,
          encoding, and then converting back to local
          representation.  In particular, this may apply to plain
          text material on systems that use newline conventions
          other than a CRLF terminator sequence.  Such an
          implementation optimization is permissible, but only
          when the combined canonicalization-encoding step is
          equivalent to performing the three steps separately.