Search code examples
formatsmtpbase64

Is base64 encoding required when sending email with pdf attachment?


I want to send a email with pdf attachment and I find a perfect example here: https://zetcode.com/golang/email-smtp/

It does work, but I don't see the necessity of base64 encoding, so I omit the base64 encoding and modify the headers and the new BuildMail function looks like this:

    func BuildMail(mail Mail) []byte {
        var buf bytes.Buffer

        buf.WriteString(fmt.Sprintf("From: %s\r\n", mail.Sender))
        buf.WriteString(fmt.Sprintf("To: %s\r\n", strings.Join(mail.To, ";")))
        buf.WriteString(fmt.Sprintf("Subject: %s\r\n", mail.Subject))

        boundary := "my-boundary-779"
        buf.WriteString("MIME-Version: 1.0\r\n")
        buf.WriteString(fmt.Sprintf("Content-Type: multipart/mixed; boundary=%s\n",
        boundary))

        buf.WriteString(fmt.Sprintf("\r\n--%s\r\n", boundary))
        buf.WriteString("Content-Type: text/plain; charset=\"utf-8\"\r\n")
        buf.WriteString(fmt.Sprintf("\r\n%s", mail.Body))

        buf.WriteString(fmt.Sprintf("\r\n--%s\r\n", boundary))
        buf.WriteString("Content-Type: application/pdf\r\n")
        buf.WriteString("Content-Disposition: attachment; filename=words.pdf\r\n")
        buf.WriteString("Content-ID: <words.pdf>\r\n\r\n")

        data := readFile("words.pdf")
        buf.Write(data)
        buf.WriteString(fmt.Sprintf("\r\n--%s", boundary))

        buf.WriteString("--")

        return buf.Bytes()
    }

But running this code results in my receiving an empty pdf attachment. So is it required that the attachment should be base64-encoded when sending via smtp? Why?


Solution

  • No, base-64 encoding specifically is not required, but the reason some encoding (out of a limited set) is required is due to the fact SMTP is a "7-bit-clean" protocol, that is, it's specified to manipulate bytes with values in the subset of the US-ASCII character set, which only defines codes in the range [0..127]–see the spec which says the allowed range is [1..127].

    Since PDF is a binary format, documents in it may contain bytes outside of the range [1..127],–namely, [0..255]. And precisely because of that, any payload carried in SMTP messages—not only "attachments" but also plain human-readable text–must be encoded in any way which produces output composed of bytes in the range [1..127] or more narrow.
    Base-64 fulfills this property, but so does quoted-printable encoding, base36, UTF-7 and basically anything other already invented or whatever you yourself could come up with–as long as it has that major property.

    Of course, then there's the question of using an encoding which is understandable by the intended recipients (note that mail transport agents are blissfully unaware of how you encode your payloads as long as the end result is 7-bit clean). And then the things basically boil down to what you're after. If you're generating mail messages intended to be read by humans using their MUAs, then base-64 and quoted-printable are universally supported, and it's a sensible thing to use one of these. If it's a service (a program) you control, you can use absolutely any encoding with the necessary properties.

    The question of using, say, base-64 vs quoted-printable is more sublte if we talk about overhead the encoding adds (the ratio of the size of the encoded representation compared to the size of the original raw blob): say, for plain English text with miniscule bits of non-ASCII stuff like the "proper" spelling of the word "naïve", QP easily wins over base-64 as it would encode only those "funky" characters and the regular characters will be left as is. For "mostly-binary" stuff such as PDFs, ZIP archives (documents produced by popular contemporary "office" suites are all ZIP archives in disguize), base-64 wins.

    Further reading.