I'm dusting off of my C skills for an upcoming class and I came across this weird output with printf
after building a string using getchar
. Specifically, any string I try to output gets the same sequence of characters appended to each letter. foo
becomes "f?8@{?o?8@{?o?8@{?"
compiling with cc
, and f¿:¿o¿:¿0¿:¿
with Apple LLVM 5.0
(Xcode). Here is the sample code that illustrates the issue:
char * input_buffer = malloc( sizeof( char ) );
char c;
while ( ( c = getchar() ) != '\n' ) {
strcat(input_buffer, &c);
}
// problem output
printf( "\n%s\n", input_buffer );
// foo -> f¿:¿o¿:¿0¿:¿
// weird side effect is the 4 is required to get a proper len
printf("\ncharacters: %lu\n", strlen( input_buffer ) / 4 );
I've searched everywhere but I'm not seeing this anywhere else, but then this seems like a bit of an edge case. Is this is some kind of an encoding issue that I am not taking into account?
char * input_buffer = malloc( sizeof( char ) );
sizeof (char)
is 1 by definition. This allocates space for a single character, and makes input_buffer
point to it.
You're also not checking whether the allocation succeeded. malloc
returns a null pointer on failure; you should always check for that.
And the allocated char
object that input_buffer
points to contains garbage.
char c;
while ( ( c = getchar() ) != '\n' ) {
strcat(input_buffer, &c);
}
getchar()
returns an int
, not a char
. You can assign the result to a char
object, but by doing so you lose the ability to detect and end-of-file or error condition. getchar()
returns EOF
when there are no more characters to be read; you should always check for that, and doing so requires storing the result in an int
. (EOF
is an integer value that's unequal to any valid character.)
strcat(input_buffer, &c);
input_buffer
points to a single uninitialized char
. You can treat it as an array consisting of a single char
element. The first argument to strcat
must already contain a valid null-terminated string, and it must have enough space to hold that string plus whatever you're appending to it.
c
is a single char
object, containing whatever character you just read with getchar(). The second argument to
strcatis a
char*, so you've got the right type -- but that
char*` must point to a valid null-terminated string.
strcat
will first scan the array pointed to by input_buffer
to find the terminating '\0'
character so it knows where to start appending -- and it will probably scan into memory that's not part of any object you've declared or allocated, possibly crashing your program. If that doesn't blow up, it will then copy characters starting at c
, and going past it into memory that you don't own. You have multiple forms of undefined behavior.
You don't need to use strcat
to append a single character to a string; you can just assign it.
Here's a simple example:
char input_buffer[100];
int i = 0; /* index into input_buffer */
int c;
while ((c = getchar()) != '\n' && c != EOF) {
input_buffer[i] = c;
i ++;
}
input_buffer[i] = '\0'; /* ensure that it's properly null-terminated */
I allocated a fixed-size buffer rather than using malloc
, just for simplicity.
Also for simplicity, I've omitted any check that the input doesn't go past the end of the input buffer. If it does, the program may crash if you're lucky; if you're not lucky, it may just appear to work while clobbering memory that doesn't belong to you. It will work ok if the input line isn't too long. In any real-world program, you'll want to check for this.
BTW, what's being done here is more easily done using fgets()
-- but it's good to learn how things work on a slightly lower level.