Search code examples
cprintfgetcharcc

Weird output of char * with getchar to printf


I'm dusting off of my C skills for an upcoming class and I came across this weird output with printf after building a string using getchar. Specifically, any string I try to output gets the same sequence of characters appended to each letter. foo becomes "f?8@{?o?8@{?o?8@{?" compiling with cc, and f¿:¿o¿:¿0¿:¿ with Apple LLVM 5.0 (Xcode). Here is the sample code that illustrates the issue:

char * input_buffer = malloc( sizeof( char ) );

char c;
while ( ( c = getchar() ) != '\n' ) {
    strcat(input_buffer, &c);
}

// problem output
printf( "\n%s\n", input_buffer );
// foo -> f¿:¿o¿:¿0¿:¿

// weird side effect is the 4 is required to get a proper len
printf("\ncharacters: %lu\n", strlen( input_buffer ) / 4 );

I've searched everywhere but I'm not seeing this anywhere else, but then this seems like a bit of an edge case. Is this is some kind of an encoding issue that I am not taking into account?


Solution

  • char * input_buffer = malloc( sizeof( char ) );
    

    sizeof (char) is 1 by definition. This allocates space for a single character, and makes input_buffer point to it.

    You're also not checking whether the allocation succeeded. malloc returns a null pointer on failure; you should always check for that.

    And the allocated char object that input_buffer points to contains garbage.

    char c;
    while ( ( c = getchar() ) != '\n' ) {
        strcat(input_buffer, &c);
    }
    

    getchar() returns an int, not a char. You can assign the result to a char object, but by doing so you lose the ability to detect and end-of-file or error condition. getchar() returns EOF when there are no more characters to be read; you should always check for that, and doing so requires storing the result in an int. (EOF is an integer value that's unequal to any valid character.)

        strcat(input_buffer, &c);
    

    input_buffer points to a single uninitialized char. You can treat it as an array consisting of a single char element. The first argument to strcat must already contain a valid null-terminated string, and it must have enough space to hold that string plus whatever you're appending to it.

    c is a single char object, containing whatever character you just read with getchar(). The second argument tostrcatis achar*, so you've got the right type -- but thatchar*` must point to a valid null-terminated string.

    strcat will first scan the array pointed to by input_buffer to find the terminating '\0' character so it knows where to start appending -- and it will probably scan into memory that's not part of any object you've declared or allocated, possibly crashing your program. If that doesn't blow up, it will then copy characters starting at c, and going past it into memory that you don't own. You have multiple forms of undefined behavior.

    You don't need to use strcat to append a single character to a string; you can just assign it.

    Here's a simple example:

    char input_buffer[100];
    int i = 0; /* index into input_buffer */
    int c;
    while ((c = getchar()) != '\n' && c != EOF) {
        input_buffer[i] = c;
        i ++;
    }
    input_buffer[i] = '\0'; /* ensure that it's properly null-terminated */
    

    I allocated a fixed-size buffer rather than using malloc, just for simplicity.

    Also for simplicity, I've omitted any check that the input doesn't go past the end of the input buffer. If it does, the program may crash if you're lucky; if you're not lucky, it may just appear to work while clobbering memory that doesn't belong to you. It will work ok if the input line isn't too long. In any real-world program, you'll want to check for this.

    BTW, what's being done here is more easily done using fgets() -- but it's good to learn how things work on a slightly lower level.