Search code examples
cstringc-stringsstrcat

strcat adds junk to the string


I'm trying to reverse a sentence, without changing the order of words,

For example: "Hello World" => "olleH dlroW"

Here is my code:

#include <stdio.h>
#include <string.h>

char * reverseWords(const char *text);
char * reverseWord(char *word);

int main () {
  char *text = "Hello World";
  char *result = reverseWords(text);
  char *expected_result = "olleH dlroW";
  printf("%s == %s\n", result, expected_result);
  printf("%d\n", strcmp(result, expected_result));
  return 0;
}

char *
reverseWords (const char *text) {
  // This function takes a string and reverses it words.
  int i, j;
  size_t len = strlen(text);
  size_t text_size = len * sizeof(char);
  // output containst the output or the result
  char *output;

  // temp_word is a temporary variable,
  // it contains each word and it will be
  // empty after each space.
  char *temp_word;

  // temp_char is a temporary variable,
  // it contains the current character
  // within the for loop below.
  char temp_char;

  // allocating memory for output.
  output = (char *) malloc (text_size + 1);

  for(i = 0; i < len; i++) {

    // if the text[i] is space, just append it
    if (text[i] == ' ') {
      output[i] = ' ';
    }

    // if the text[i] is NULL, just get out of the loop
    if (text[i] == '\0') {
      break;
    }

    // allocate memory for the temp_word
    temp_word = (char *) malloc (text_size + 1);

    // set j to 0, so we can iterate only on the word
    j = 0;

    // while text[i + j] is not space or NULL, continue the loop
    while((text[i + j] != ' ') && (text[i + j] != '\0')) {

      // assign and cast test[i+j] to temp_char as a character,
      // (it reads it as string by default)
      temp_char = (char) text[i+j];

      // concat temp_char to the temp_word
      strcat(temp_word, &temp_char); // <= PROBLEM

      // add one to j
      j++;
    }

    // after the loop, concat the reversed version
    // of the word to the output
    strcat(output, reverseWord(temp_word));

    // if text[i+j] is space, concat space to the output
    if (text[i+j] == ' ')
      strcat(output, " ");

    // free the memory allocated for the temp_word
    free(temp_word);

    // add j to i, so u can skip 
    // the character that already read.
    i += j;
  }

  return output;
}

char *
reverseWord (char *word) {
  int i, j;
  size_t len = strlen(word);
  char *output;

  output = (char *) malloc (len + 1);

  j = 0;
  for(i = (len - 1); i >= 0; i--) {
    output[j++] = word[i];
  }

  return output;
}

The problem is the line I marked with <= PROBLEM, On the first word which in this case is "Hello", it does everything just fine.

On the second word which in this case is "World", It adds junky characters to the temp_word, I checked it with gdb, temp_char doesn't contain the junk, but when strcat runs, the latest character appended to the temp_word would be something like W\006,

It appends \006 to all of the characters within the second word,

The output that I see on the terminal is fine, but printing out strcmp and comparting the result with expected_result returns -94.

  • What can be the problem?
  • What's the \006 character?
  • Why strcat adds it?
  • How can I prevent this behavior?

Solution

  • strcat() expects addresses of the 1st character of "C"-strings, which in fact are char-arrays with at least one element being equal to '\0'.

    Neither the memory temp_word points to nor the memory &temp_char points to meet such requirements.

    Due to this the infamous undefined behaviour is invoked, anything can happen from then on.

    A possible fix would be to change

          temp_word = (char *) malloc (text_size + 1);
    

    to become

          temp_word = malloc (text_size + 1); /* Not the issue but the cast is 
                                                 just useless in C. */
          temp_word[0] = '\0';
    

    and this

            strcat(temp_word, &temp_char);
    

    to become

            strcat(temp_word, (char[2]){temp_char});
    

    There might be other issues with the rest of the code.