Search code examples
cloopsstdoutstdininfinite-loop

Very complex C infinite loop debugging


I want to color stand alone digits, not anything that is preceded with an alphabetical character and terminated with a numeric one.

Can anyone guess why when I use this program to write some existing text file to another text file, it is several megabytes in size in just seconds? There is some infinite loop occurring.

char ch;
_Bool finished = true;
void color();
void skip();

int main()
{
    while (finished)
    {
        ch = getchar();
        if (ch == -1) {
            finished = false;
        }
        if ( (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') ) {
            skip();
        }
        else if (ch >= '0' && ch <= '9'){
            color();
        }
        else {
            printf("%c", ch);
        }
    }
    return 0;
}

void color() // color characters that are numbers
{
    printf("\e[31m%c\e[0m", ch);
}

void skip() // skip whole words and still print them
{
    putchar(ch);
    while ((ch = getchar()) != ' ') {
        printf("%c", ch);
    }
}

Solution

  • Infinite Loops

    The infinite loop you report is likely caused by incorrect handling of EOF in the posted code. A plain char may be signed or unsigned, and so may not be able to hold the value of EOF, which is typically -1.

    The skip() function also fails to account for the possibility that EOF may be encountered. Here, if EOF is encountered before a space, e.g., in the last word of a file, an infinite loop will result.

    Header Files

    The posted code is missing header files. At least #include <stdio.h> and #include <stdbool.h> are needed. stdio.h is needed for obvious reasons, but stdbool.h is needed also, as that is where the true and false macros are defined. The _Bool type can still be used, but it would be better to use the typedef bool which is defined in stdbool.h.

    Function declarations

    Empty parentheses in a function prototype indicates an unspecified number of arguments. void must be used to specify that no arguments are to be taken. This usage of empty parentheses in function declarators is an obsolescent feature of the language, and should not be used anyway.

    ANSI Escape Codes

    The ability to print colors in C is dependent upon the platform and terminal emulator. Some tutorials use \e as a stand-in for the escape character, but to generate this character in C you should probably use \x1b (the ASCII hexadecimal value for the ESC character; you could also use \033, which is the ASCII octal value for ESC). My reading of the Standard is that \e is not a valid character constant, and using this generates the compiler warning:

    warning: non-ISO-standard escape sequence, '\e'

    But, this does appear to work for me in GCC on Linux, so I suspect that this is an extension.

    Logic Problems

    The skip() function fails to check for EOF, but also fails to print spaces when they are encountered. Further, a newline character should be able to precede a digit, since this signals a line-ending. Not handling this correctly leads to failure to highlight initial digits on the lines following the first line of input. These problems can be resolved by testing for EOF and \n in the loop condition, and by printing the character which caused the loop to terminate if it is a space or newline.

    void skip(void) // skip whole words and still print them
    {
        putchar(ch);
        while ((ch = getchar()) != ' ' && ch != '\n' && ch != EOF) {
            printf("%c", ch);
        }
        if (ch == ' ' || ch == '\n') {
            putchar(ch);
        }
    }
    

    There is a similar problem in main(). When EOF is encountered in the main loop, the final printf() is still executed, printing a character when none should be printed. One solution is to place the statements after the EOF test in an else block, though there are better solutions.

    while (finished)
    {
        ch = getchar();
        if (ch == EOF) {   // use EOF instead of -1
            finished = false;
        } else {           // need this to avoid printing a character for EOF
            if ( (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') ) {
                skip();
            } else if (ch >= '0' && ch <= '9') {
                color();
            }
            else {
                printf("%c", ch);
            }
        }
    }
    

    Here is the program so far. This works as expected, as far as I can tell from your description of the expected behavior:

    #include <stdio.h>
    #include <stdbool.h>
    
    int ch;
    bool finished = true;
    
    void color(void);
    void skip(void);
    
    int main(void)
    {
        while (finished)
        {
            ch = getchar();
            if (ch == EOF) {
                finished = false;
            } else {
                if ( (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z') ) {
                    skip();
                } else if (ch >= '0' && ch <= '9') {
                    color();
                }
                else {
                    printf("%c", ch);
                }
            }
        }
        return 0;
    }
    
    void color(void) // color characters that are numbers
    {
        printf("\x1b[31m%c\x1b[0m", ch);
    }
    
    void skip(void) // skip whole words and still print them
    {
        putchar(ch);
        while ((ch = getchar()) != ' ' && ch != '\n' && ch != EOF) {
            printf("%c", ch);
        }
        if (ch == ' ' || ch == '\n') {
            putchar(ch);
        }
    }
    

    Improvements

    There are a number of improvements that can be made to this code.

    There is no need for the bool at all. A break statement could simply terminate the main loop when EOF is encountered. In this case, the main loop would need to execute indefinitely, with while (1) {}, or better, for (;;) {}. Even better, test ch in the loop condition with while ((ch = getchar()) != EOF) {}.

    Having removed the global variable finished, we can also make ch local to main(). This may require passing the value of ch to both color() and skip(), in which case the function signatures need to be changed. But, note that there is no reason to pass a character to skip(), since this character can simply be printed in main() before calling skip(). Further, there is no need for a color() function, since this one-line function could simply be manually inlined.

    There is no need to use printf() where a putchar() will do.

    It would be nice to #define a couple of escape code macros. This is easier to read, and easier to modify.

    Finally, it would be better to use the functions in ctype.h to check whether a character is a digit, or an alphabetic character, or a whitespace character. This is more portable, and less error-prone, than the direct comparisons in the posted code. By using isspace() in the skip() function, the \n character is automatically checked for, avoiding the earlier issue with forgetting to test for line-endings. This also handles other whitespace characters, such as \t. Note that the functions in ctype.h expect arguments which can be represented as unsigned char values, so a cast is needed here.

    Here is the improved code:

    #include <stdio.h>
    #include <ctype.h>
    
    #define RED    "\x1b[31m"
    #define RESET  "\x1b[0m"
    
    void skip(void);
    
    int main(void)
    {
        int ch;
        while ((ch = getchar()) != EOF) {
            if (isalpha((unsigned char) ch)) {
                putchar(ch);
                skip();
            } else if (isdigit((unsigned char) ch)) {
                printf(RED "%c" RESET, ch);
            } else {
                putchar(ch);
            }
        }
        return 0;
    }
    
    void skip(void) // skip whole words and still print them
    {
        int c = getchar();
        while (!isspace((unsigned char) c) && c != EOF) {
            putchar(c);
            c = getchar();
        }
        if (isspace((unsigned char) c)) {
            putchar(c);
        }
    }
    

    Here is a sample program output. The colors don't show up here, so I have put parentheses around the numbers that are highlighted in red on my terminal:

    (1) this is a test testing123 (456) test
    (2) second line test (3) (4) (5)
    (3)rd line test
    (4) (5) (6) (789)