Search code examples
ccastingintegercharc-strings

Does casting from Char to Int always give positive values in C


I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

    char input[] = "Hello World";
    int inputLength = sizeof(input)/ sizeof(char);
    int *frequencies = calloc(256, sizeof(int));
    for(int i = 0; i < inputLength-1; i++)
    {
        int value = (int) input[i];
        assert(value > -1);//Is this line redundant?
        frequencies[value] += 1;
    }
    printf("(%d)", inputLength);
    PrintFrequencies(frequencies);
    free(frequencies);

Solution

  • Does casting from Char to Int always give positive values in C

    Generally speaking, no. char may be either a signed or an unsigned type, at the C implementation's discretion, but pretty frequently it is a signed type.

    All char values representing members of the basic execution character set are guaranteed to be non-negative. This includes the upper- and lowercase Latin letters, the decimal digits, a variety of punctuation, the space character and a few control characters. The char values representing other characters may be negative, however. Also, the multiple char values constituting the representation of a multi-byte character can include some that, considered as individual chars, are negative.

    I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

    Your assert() is semantically wrong. If you're reading arbitrary text and you want your program to be robust, then you do need to be prepared for chars with negative values. But

    1. assertions are the wrong tool for this job. Assertions are for checking that the invariants your program assumes in fact hold. You might use an assertion if you (thought you) had a guarantee that char values were always non-negative, for example. If an assertion ever fails, it means your code is wrong.

      You must never use an assertion to validate input data or perform any other test that your program relies upon being performed, because depending on how you compile the program, the asserted expression might not be evaluated at all.

    2. It would be better for your program to handle negative char values if they are encountered than to fail. In this regard, do note that there's no particular use in converting your char explicitly to int. You can use a char directly anywhere where you want an integer. On the other hand, it might make sense to cast to unsigned char, as that will be cheap -- possibly free, even if char is signed -- and it will take care of your signedness problem.