Search code examples
coverflowdetectionstrtol

Can't get strtol() to detect overflow


This program uses a lexical scanner to classify tokens as symbol, string, decimal number, hex number,… When a "number" is detected, it is handed over to strtol() to convert it to the internal 32-bit binary value. However I can't get strtol() to reliably return error on overflow.

Part of conversion code is:

    errno = 0;      // erase any previous error in errno
    switch (constType) {
…
case lxHex:         // hexadecimal number X'1234567890ABCDEF' (X-string)
fprintf(stderr,"** FindConstantFromString - converting %s\n",constBuffer);
        newDictEntry->dcValue = strtol(constBuffer+2, NULL, 16);
int myerr = errno;
fprintf(stderr,"     value %x errno %d\n",newDictEntry->dcValue, myerr);
        newDictEntry->dcType = syNumber;
        newDictEntry->dcSubType = 4;    // hexadecimal
        if  (   EINVAL == errno
            ||  ERANGE == errno
            ) {
            ErrDict = newDictEntry;
            AnaError (ConstMsg+2);
            newDictEntry->dcType = sySLit;
        };
        result.cstClass = newDictEntry->dcType;
        return result;
…

When this code is tested with wrong input, it detects overflow only if the first hex digit is >= 8 (potentially giving negative value), as remonstrated by:

    29            | declare v;
    30            | v = x'fedcba9876543210'
** FindConstantFromString - meeting x'fedcba9876543210' as 11
** FindConstantFromString - converting x'fedcba9876543210'
     value ffffffff errno 34
*Error 32: Candidate number x'fedcba9876543210' too large or could not be converted
*Error 20: Unrecognisable lexical unit x'fedcba9876543210' at 30.5
    31            |     + x'123456789abcdef'
** FindConstantFromString - meeting x'123456789abcdef' as 11
** FindConstantFromString - converting x'123456789abcdef'
     value 89abcdef errno 0
    32            |     + 9876543210
** FindConstantFromString - meeting x'fedcba9876543210' as 8
                            symbol already known
** FindConstantFromString - converting x'fedcba9876543210'
     value 0 errno 0
*Error 32: Candidate number x'fedcba9876543210' too large or could not be converted
** FindConstantFromString - meeting 9876543210 as 8
** FindConstantFromString - converting 9876543210
     value 4cb016ea errno 0
    33            |     + '12345a'
** FindConstantFromString - meeting 12345a as 3
    34            |     + '';
** FindConstantFromString - meeting 12345a as 8
                            symbol already known
** FindConstantFromString - converting 12345a
     value 3039 errno 0
*Error 32: Candidate number 12345a too large or could not be converted
** FindConstantFromString - meeting  as 3
** FindConstantFromString - meeting  as 8
                            symbol already known
*Error 33: Empty string cannot be converted to number

At line 30, the lexical scanner recognized a hex number and requested conversion from this hex form (11 = lxHex). strtol() correctly sets errno to ERANGE and error message is issued. Overflowed hex number is then kept in the dictionary as a string.

Note that the returned value is -1, not LONG_MAX.

At line 31, we again have another overflowing hex number, but it does not start with 8-9a-f. It is again detected as an hex number. Conversion is attempted but errno is not set at all. The value correspond to the lower 32 bits of the number. Since this considered success, the truncated value is kept as the result.

When + is applied to "x'fed…'" and 89abcdef, another conversion is attempted on the string "x'fed…'" supposed to be a decimal number (denoted by the 8-request) and conversion fails because "x" cannot begin a decimal number.

At line 32, we have an overflowing decimal 987654321. Once again, overflow is not detected (code not shown but similar to the one for hex numbers with the addition of a test on "endptr" since the strings may not be filtered by the lexical scanner and contain illegal characters). The returned value contains the least 32 bits of the number.

If I change strtol() to strtoul(), the first ERANGE error disappears and I get the least 32 bits of the number.

What am I doing wrong?

System: Fedora Linux 29 glibc: 2.27


Solution

  • strtol() derives a long (i.e. a signed long) value from the given string. It only cares about overflows or underflows of the 64-bit long value that it constructs internally, and that it will eventually return to its caller if no problems are encountered. It does not care at all about overflows or underflows of 32-bit values.

    This is why strtol() only returns an error in the examples you show where the long value would have overflowed into a negative 64-bit number. (And as you noted, strtoul() does not complain in that case because there is no overflow of an unsigned long value in that case. You would need to feed strtoul() a 17-digit string to overflow an unsigned long.)

    strtol() also does not know or care that your program will take its 64-bit long result and immediately discard its upper 4 bytes by assigning the value to a 32-bit variable. This truncation is why you were led to think that "the returned value is -1, not LONG_MAX". In fact the result from strtol() was LONG_MAX, but your program has discarded the top 4 bytes of LONG_MAX and is left with only the low 4 bytes, whose value is 0xffffffff or -1 when treated as a 32-bit int.

    If you want to use strtol() to generate and vet 32-bit values then you'll have to do additional range checking yourself. After first collecting the strtol() result into a long variable and checking whether that result indicates a 64-bit overflow or underflow during the execution of strtol(), you can then compare that long result against INT_MAX and INT_MIN to see whether its value would overflow or underflow a 32-bit variable.

    Obviously you can wrap this up in a little function, which could (if you do the appropriate tinkering with errno) behave just like strtol() except that it applies to int values rather than long. However, you should resist the urge to give your function the name strtoi because names that begin with str[a-z] are reserved by POSIX for future use in the standard library. Some systems might already have a strtoi and Linux might get one someday.