Search code examples
cfor-loopshared-librariesesp32arduino-ide

Why does for loop behavior change when debug statement Serial.println(i); is present vs. commented out


I have written a base 64 encoding / decoding library for the Arduino IDE (yes, I am aware such libraries already exist. This is for my own education as much as anything practical). My target microcontroller is an Espressif ESP32.

In my base64 decoding function, the index variable i is affected by the presence of a Serial.println() debug statement. It is the strangest thing I have ever seen and I cannot figure out why it should make a difference if there is debug printing or not.

Using the test program base64.ino and the b64.cpp functions below, here are two samples of my serial output. In the first example, I'm using a Serial.println(i); in the function b64dec() just after the for loop. In the second example, the Serial.println(i); is commented out. That is the only difference and I get two drastically different results.

Am I missing something here? Is it a compiler optimization gone wonky? My understanding of C variable scoping is that there are only global, function, and parameter levels. The i in the for loop should be the same as the int i = 0; a few lines above it. I don't believe it's buffer eoverflow either, since the debugging output of 13 for the decoded length is accurate for a 12 character message and its NULL terminator.

This is what I expect to get any time I run it:

Hello World!
...is encoded as...
SGVsbG8gV29ybGQhAA==

SGVsbG8gV29ybGQhAA==
Size of encoded message (including NULL terminator): 21
Bytes needed for decoded message: 13
Looping until i < 15
0
4
8
12
Bytes decoded: 13
72 101 108 108 111 32 87 111 114 108 100 33 0
Decoded message: Hello World!

This is what I get when Serial.println(i); is commented out:

Hello World!
...is encoded as...
SGVsbG8gV29ybGQhAA==

SGVsbG8gV29ybGQhAA==
Size of encoded message (including NULL terminator): 21
Bytes needed for decoded message: 13
Looping until i < 15
Bytes decoded: 4
72 101 108 108
Decoded message: Hell

I am quite literally stuck in hell.

As you can see from the other debug output, it should be looping until i<15, yet it only makes it to 4.

Comparing my decode function, b64dec(), to my encode function, b64enc(), the way of looping is very similar. Yet, b64enc() does not need a debug Serial.println() to make it work.

Any suggestions on what I might be missing here would be a appreciated.

Here is the code:

base64.ino -- this is the test program calling the library functions.

#include <b64.h>

void setup() {
  Serial.begin(115200);
  delay(1000);

  // Encoding example.
  char msg1[] = "Hello World!";
  char result1[b64enclen(sizeof msg1)];
  b64enc(msg1, result1, sizeof msg1);
  Serial.println(msg1);
  Serial.println("...is encoded as...");
  Serial.println(result1);
  Serial.println();

  // Decoding example.
  char enc_msg[] = "SGVsbG8gV29ybGQhAA==";
  Serial.println(enc_msg);
  Serial.print("Size of encoded message (including NULL terminator): ");
  Serial.println(sizeof enc_msg);
  Serial.print("Bytes needed for decoded message: ");
  Serial.println(b64declen(enc_msg, sizeof enc_msg));
  char dec_result[b64declen(enc_msg, sizeof enc_msg)];
  int declen = b64dec(enc_msg, dec_result, sizeof enc_msg);
  Serial.print("Bytes decoded: ");
  Serial.println(declen, DEC);
  for (int k=0; k<declen; k++) {
    Serial.print(dec_result[k], DEC);
    Serial.print(" ");
  }
  Serial.println();
  Serial.print("Decoded message: ");
  Serial.println(dec_result);
}
  
void loop() {
  
}

b64.cpp -- Note: this is straight C, but the linker won't find it unless it's got a .cpp extension.

/*
  base64 functions for turning binary data into strings and back again.
  Created April 2021 - David Horton and released to public domain.
  
  Permission to use, copy, modify, and/or distribute this software for any
  purpose with or without fee is hereby granted.

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
  OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  OTHER DEALINGS IN THE SOFTWARE.
*/

#include <b64.h>
#include <Arduino.h>

// b64map - 6-bit index selects the correct character from the base64 
// 'alphabet' as described in RFC4648. Also used for decoding functions.
const char b64map[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

// b64pad - padding character, also described in RFC4648.
const char b64pad = '=';

size_t b64enclen(size_t unenc_len) {
  size_t enc_len;

  // Padding the unencoded length by 2 for worst case makes calculating
  // easier. If it turns out padding is not needed, integer division will
  // drop the fractional part. The result is the correct length without any
  // of the hassle dealing with padding.
  unenc_len += 2;
  
  // Encoded is four-thirds the unencoded length. Add 1 for NULL terminator.
  enc_len = (unenc_len / 3 * 4) + 1;

  return enc_len;
}

int b64enc(char *unenc, char *enc, size_t unenc_len) {
  unsigned char buffer[4];  // Temp storage for mapping three bytes to four characters.

  // Any input not evenly divisible by three requires padding at the end.
  // Determining what remainder exists after dividing by three helps when
  // dealing with those special cases. 
  int remainder = unenc_len %3;
  
  // Loop through unencoded characters in sets of three at a time. Any one or
  // two characters remaining are dealt with at the end to properly determine
  // their padding.
  int i = 0;
  int j = 0;
  for (i=0; i<unenc_len - remainder; i+=3) {  // Minus padding remainder.

    // Take three bytes of eight bits and map onto four chars of six bits.
    // E.g. ABCDEFGH IJKLMNOP QRSTUVWX => 00ABCDEF 00GHIJKL 00MNOPQR 00STUVWX
    buffer[0] = unenc[i] >> 2;
    buffer[1] = (unenc[i] & 0B00000011) << 4 | unenc[i+1] >> 4;
    buffer[2] = (unenc[i+1] & 0B00001111) << 2 | unenc[i+2] >> 6;
    buffer[3] = unenc[i+2] & 0B00111111;

    // Map the six-bit bytes onto the ASCII characters used in base64.
    enc[j++] = b64map[buffer[0]];
    enc[j++] = b64map[buffer[1]];
    enc[j++] = b64map[buffer[2]];
    enc[j++] = b64map[buffer[3]];
  }

  // The remaining characters are handled differently, because there could
  // be padding. The amount of padding depends upon if there are one or two
  // characters left over.
  switch (remainder) {
    case 2:
      buffer[0] = unenc[i] >> 2;
      buffer[1] = (unenc[i] & 0B00000011) << 4 | unenc[i+1] >> 4;
      buffer[2] = (unenc[i+1] & 0B00001111) << 2 | unenc[i+2] >> 6;
      enc[j++] = b64map[buffer[0]];
      enc[j++] = b64map[buffer[1]];
      enc[j++] = b64map[buffer[2]];
      enc[j++] = b64pad;
      break;
    case 1:
      buffer[0] = unenc[i] >> 2;
      buffer[1] = (unenc[i] & 0B00000011) << 4;
      enc[j++] = b64map[buffer[0]];
      enc[j++] = b64map[buffer[1]];
      enc[j++] = b64pad;
      enc[j++] = b64pad;
      break;
  }

  // Finish with a NULL terminator since the encoded result is a string.
  enc[j] = '\0';
 
  return j;
}

size_t b64declen(char * enc, size_t enc_len) {
  size_t dec_len;
  
  // Any C-style string not ending with a NULL timinator is invalid.
  // Rememeber to subtract one from the length due to zero indexing.
  if (enc[enc_len - 1] != '\0') return 0;
  
  // Even a single byte encoded to base64 results in a for character
  // string (two chars, two padding.) Anything less is invalid.
  if (enc_len < 4) return 0;

  // Padded base64 string lengths are always divisible by four (after
  // subtracting the NULL terminator) Otherwise, they're not vaild.
  if ((enc_len - 1) %4 != 0) return 0;

  // Maximum decoded length is three-fourths the encoded length.
  dec_len = ((enc_len - 1) / 4 * 3);

  // Padding characters don't count for decoded length.
  if (enc[enc_len - 2] == b64pad) dec_len--;
  if (enc[enc_len - 3] == b64pad) dec_len--;

  return dec_len;
}

int b64dec(char *enc, char *dec, size_t enc_len) {
  unsigned char buffer[4];  // Temp storage for mapping three bytes to four characters.

  // base64 encoded input should always be evenly divisible by four, due to
  // padding characters. If not, it's an error. Note: because base64 is held
  // in a C-style string, there's the NULL terminator to subtract first.
  if ((enc_len - 1) %4 != 0) return 0;

  int padded = 0;
  if (enc[enc_len - 2] == b64pad) padded++;
  if (enc[enc_len - 3] == b64pad) padded++;

  // Loop through encoded characters in sets of four at a time, because there
  // are four encoded characters for every three decoded characters. But, if
  // its not evenly divisible by four leave the remaining as a special case.
  int i = 0;
  int j = 0;

  Serial.print("Looping until i < ");
  Serial.println(enc_len - padded - 4);

  for (i=0; i<enc_len - padded - 4; i+=4) {

    Serial.println(i);  // <-- This is the line that makes all the difference.

    // Take four chars of six bits and map onto four bytes of eight bits.
    // E.g. 00ABCDEF 00GHIJKL 00MNOPQR 00STUVWX => ABCDEFGH IJKLMNOP QRSTUVWX
    buffer[i] = strchr(b64map, enc[i]) - b64map;
    buffer[i+1] = strchr(b64map, enc[i+1]) - b64map;
    buffer[i+2] = strchr(b64map, enc[i+2]) - b64map;
    buffer[i+3] = strchr(b64map, enc[i+3]) - b64map;
    dec[j++] = buffer[i] << 2 | buffer[i+1] >> 4;
    dec[j++] = buffer[i+1] << 4 | buffer[i+2] >> 2;
    dec[j++] = buffer[i+2] << 6 | buffer[i+3];
  }

  // Take care of special case.
  switch (padded) {
    case 1:
      buffer[i] = strchr(b64map, enc[i]) - b64map;
      buffer[i+1] = strchr(b64map, enc[i+1]) - b64map;
      buffer[i+2] = strchr(b64map, enc[i+2]) - b64map;
      dec[j++] = buffer[i] << 2 | buffer[i+1] >> 4;
      dec[j++] = buffer[i+1] << 4 | buffer[i+2] >> 2;
      break;
    case 2:
      buffer[i] = strchr(b64map, enc[i]) - b64map;
      buffer[i+1] = strchr(b64map, enc[i+1]) - b64map;
      dec[j++] = buffer[i] << 2 | buffer[i+1] >> 4;
      break;
  }
  
  return j;
}

b64.h

/*
  base64 functions for turning binary data into strings and back again.
  Created April 2021 - David Horton and released to public domain.
  
  Permission to use, copy, modify, and/or distribute this software for any
  purpose with or without fee is hereby granted.

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
  IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
  OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  OTHER DEALINGS IN THE SOFTWARE.
*/

#ifndef b64.h
#define b64.h

#include <stddef.h>
#include <string.h>

/*
 * b64enclen
 *   Given number of binary bytes, calculate the number of characters needed
 *   to represent the data as base64. Include room for null terminator, since
 *   the results of encoding will be a C-style string.
 * Parameters:
 *   unenc_len - the length of the unencoded array of characters. A simple
 *    'sizeof unencoded' will work.
 * Returns:
 *   size_t number or characters required for base64 encoded message plus a
 *   NULL terminator. Suitable for array declarations such as:
 *   'char output[b64enclen(sizeof unencoded)]'
 */
size_t b64enclen(size_t unenc_len);

/*
 * b64enc
 *   Given an unencoded array of bytes (binary data) and its length, fill
 *   the encoded array with a base64 representation of the input. b64enclen()
 *   should be used to properly size the character array that will hold the
 *   encoded output.
 * Parameters:
 *   unenc - pointer to a character array with the contents to be encoded.
 *   enc - pointer to a byte array that will be filled with base64 output.
 *   unenc_len - length of the string pointed to by unenc. Can be found with
 *     'sizeof unenc'
 * Returns:
 *   Integer representing the number of bytes encoded.
 */
int b64enc(char *unenc, char *enc, size_t unenc_len);

/*
 * b64declen
 *   Given a base64 encoded string and its length, perform a number of 
 *   tests to validate the string. Then, calculate the number of bytes
 *   needed to represent the binary data after decoding.
 * Parameters:
 *   enc - a pointer to the base64 encoded string.
 *   enc_len - the length of the encoded string (i.e. 'sizeof enc'.)
 * Returns:
 *   size_t number of characters required for the decoded message or
 *   0 in the case of an invalid base64 encoded string.
 */
size_t b64declen(char * enc, size_t enc_len);

/*
 * b64dec
 *   Given a base64 encoded string and its length, fill the decoded array
 *   with the decoded binary representation of the input. b64declen() should
 *   be used to properly size the array intended to hold the encoded output.
 */
int b64dec(char *enc, char *dec, size_t enc_len);

#endif

The same thing happens with "Hello Cleveland!"

With Serial.println(i);:

Hello Cleveland!
...is encoded as...
SGVsbG8gQ2xldmVsYW5kIQA=

SGVsbG8gQ2xldmVsYW5kIQA=
Size of encoded message (including NULL terminator): 25
Bytes needed for decoded message: 17
0
4
8
12
16
Bytes decoded: 17
72 101 108 108 111 32 67 108 101 118 101 108 97 110 100 33 0
Decoded message: Hello Cleveland!

With Serial.println(i); commented out:

Hello Cleveland!
...is encoded as...
SGVsbG8gQ2xldmVsYW5kIQA=

SGVsbG8gQ2xldmVsYW5kIQA=
Size of encoded message (including NULL terminator): 25
Bytes needed for decoded message: 17
Bytes decoded: 5
72 101 108 108 111
Decoded message: Hello▒?

Solution

  • You have declared an array of 4 chars:

    int b64dec(char *enc, char *dec, size_t enc_len) {
      unsigned char buffer[4]; 
    

    and you index it beyond [3]:

      for (i=0; i<enc_len - padded - 4; i+=4) {
        buffer[i] = ...
    

    This sort of buffer overflow may be obscured by other arbitrary things in the code, such as the call on Serial.println().