Search code examples
base64esp32arduino-esp32

base64 encoding function inserting a newline every 72 characters?


Hi I was working on base64 encoding audio files (to upload to a server) on an ESP32 and came across this implementation, part of espressif's esp idf:

unsigned char * base64_encode(const unsigned char *src, size_t len,
                  size_t *out_len)
{
    unsigned char *out, *pos;
    const unsigned char *end, *in;
    size_t olen;
    int line_len;

    olen = len * 4 / 3 + 4; /* 3-byte blocks to 4-byte */
    olen += olen / 72; /* line feeds */
    olen++; /* nul termination */
    if (olen < len)
        return NULL; /* integer overflow */
    out = os_malloc(olen);
    if (out == NULL)
        return NULL;

    end = src + len;
    in = src;
    pos = out;
    line_len = 0;
    while (end - in >= 3) {
        *pos++ = base64_table[in[0] >> 2];
        *pos++ = base64_table[((in[0] & 0x03) << 4) | (in[1] >> 4)];
        *pos++ = base64_table[((in[1] & 0x0f) << 2) | (in[2] >> 6)];
        *pos++ = base64_table[in[2] & 0x3f];
        in += 3;
        line_len += 4;
        if (line_len >= 72) { // HERE
            *pos++ = '\n';
            line_len = 0;
        }
    }

    if (end - in) {
        *pos++ = base64_table[in[0] >> 2];
        if (end - in == 1) {
            *pos++ = base64_table[(in[0] & 0x03) << 4];
            *pos++ = '=';
        } else {
            *pos++ = base64_table[((in[0] & 0x03) << 4) |
                          (in[1] >> 4)];
            *pos++ = base64_table[(in[1] & 0x0f) << 2];
        }
        *pos++ = '=';
        line_len += 4;
    }

    if (line_len)
        *pos++ = '\n';

    *pos = '\0';
    if (out_len)
        *out_len = pos - out;
    return out;
}

I see that there is a variable caled line_len which is inserting a newline every 72 characters and also one at the end:

if (line_len)
    *pos++ = '\n';

Because of this, when I test the function, I always get a newline appended to the encoded strings. When I compare the encoded strings with the base64 python module (base64.b64encode(b'blah')). the python strings DO NOT have any newline appended.

Is there a reason for this decision? I recall 72 characters per line being a constraint of old computer terminals in the early days of email. That have anything to do with this?


Solution

  • There is no requirement to add newlines to prevent base64 itself from growing very long; but some applications (in particular, email) absolutely require it, so it probably makes sense to include in an implementation.

    base64 was originally designed as a content-transfer encoding for email, with the specific requirement that no line must be longer than 72 characters.

    Just to spell this out, newlines are permitted, and so the implementation you are looking at is conformant with the spec.

    If the newlines disturb you, just take them out. A proper test would ignore them when comparing the results from two different base64 encoders anyway; there is no requirement to make lines exactly 72 lines long, so some play it safe and make them e.g. 64 characters long max.