Search code examples
cpointersstrtok

Function returns char**s, running the function twice causes the return value to differ


Description of what my function attempts to do

My function gets a string for example "Ab + abc EF++aG hi jkL" and turns it into ["abc", "hi"]

In addition, the function only takes into account letters and the letters all have to be lowercase.

The problem is that

char* str1 = "Ab +  abc EF++aG hi  jkL";
char* str2 = "This is a very famous quote";

char** tokens1 = get_tokens(str1); 
printf("%s", tokens1[0]);            <----- prints out "abc" correct output
char** tokens2 = get_tokens(str2);
printf("%s", tokens1[0]);            <----- prints out "s" incorrect output

get_tokens function (Returns the 2d array)

char** get_tokens(const char* str) {
  // implement me
  int num_tokens = count_tokens(str); 

  char delim[] = " ";
  int str_length = strlen(str);
  char* new_str = malloc(str_length); 
  strcpy(new_str, str); 

  char* ptr = strtok(new_str, delim);
  int index = 0;

  char** array_2d = malloc(sizeof(char*) *num_tokens);

  while (ptr != NULL){
    if (check_string(ptr) == 0){

      array_2d[index] = ptr; 
      index++;
    }

    ptr = strtok(NULL, delim); 
  } 

  free(new_str); 
  new_str = NULL; 

  free(ptr);
  ptr = NULL; 

  return array_2d;
}  

count_tokens function (returns the number of valid strings)

for example count_tokens("AB + abc EF++aG hi jkL") returns 2 because only "abc" and "hi" are valid

int count_tokens(const char* str) {
  // implement me
  //Seperate string using strtok

  char delim[] = " ";
  int str_length = strlen(str);
  char* new_str = malloc(str_length); 
  strcpy(new_str, str); 

  char* ptr = strtok(new_str, delim); 

  int counter = 0; 

  while (ptr != NULL){
    if (check_string(ptr) == 0){
      counter++;
    }

    ptr = strtok(NULL, delim); 
  }
  free(new_str);     
  return counter;
}  


Lastly check_string() checks if a string is valid

For example check_string("Ab") is invalid because there is a A inside.

using strtok to split "Ab + abc EF++aG hi jkL" into separate parts

int check_string(char* str){ 
  // 0 = false 
  // 1 = true
  int invalid_chars = 0; 

   for (int i = 0; i<strlen(str); i++){
     int char_int_val = (int) str[i];
     if (!((char_int_val >= 97 && char_int_val <= 122))){
        invalid_chars = 1; 
    }
   }

  return invalid_chars;  
}


Any help would be much appreciated. Thank you for reading.

If you have any questions about how the code works please ask me. Also I'm new to stackoverflow, please tell me if I have to change something.


Solution

  • You have a few problems in your code. First I'll repeat what I've said in the comments:

    • Not allocating enough space for the string copies. strlen does not include the NUL terminator in its length, so when you do
    char* new_str = malloc(str_length); 
    strcpy(new_str, str);
    

    new_str overflows by 1 when strcpy adds the '\0', invoking undefined behavior. You need to allocate one extra:

    char* new_str = malloc(str_length + 1); 
    strcpy(new_str, str);
    

    Your final problem is because of this:

    // copy str to new_str, that's correct because strtok
    // will manipulate the string you pass into it
    strcpy(new_str, str);  
    // get the first token and allocate size for the number of tokens,
    // so far so good (but you should check that malloc succeeded)
    char* ptr = strtok(new_str, delim);
    char** array_2d = malloc(sizeof(char*) *num_tokens);
    
    while (ptr != NULL){
        if (check_string(ptr) == 0){
          // whoops, this is where the trouble starts ...
          array_2d[index] = ptr; 
          index++;
        }
        // get the next token, this is correct
        ptr = strtok(NULL, delim); 
      } 
      // ... because you free new_str
      free(new_str); 
    

    ptr is a pointer to some token in new_str. As soon as you free(new_str), Any pointer pointing to that now-deallocated memory is invalid. You've loaded up array_2d with pointers to memory that's no longer allocated. Trying to access those locations again invokes undefined behavior. There's two ways I can think of off the top to solve this:

    1. Instead of saving pointers that are offsets to new_str, find the same tokens in str (the string from main) and point to those instead. Since those are defined in main, they will exist for as long as the program exists.
    2. Allocate some more memory, and strcpy the token into array_2d[index]. I'll demonstrate this below:
    while (ptr != NULL){
        if (check_string(ptr) == false)
        {
          // allocate (enough) memory for the pointer at index
          array_2d[index] = malloc(strlen(ptr) + 1);
          // you should _always_ check that malloc succeeds
          if (array_2d[index] != NULL)
          {
              // _copy_ the string pointed to by ptr into our new space rather
              // than simply assigning the pointer
              strcpy(array_2d[index], ptr);
          }
          else { /* handle no mem error how you want */ }
          index++;
        }
    
        ptr = strtok(NULL, delim); 
    }
    
    // now we can safely free new_str without invalidating anything in array_2d
    free(new_str); 
    

    I have a working demonstration here. Note some other changes in the demo:

    • #include <stdbool.h> and used that instead of 0 and 1 ints.
    • Changed your get_tokens function a bit to "return" the number of tokens. This is useful in main for printing them out.
    • Replaced the ASCII magic numbers with their characters.
    • Removed the useless freedPointer = NULL lines.
    • Changed your ints to size_t types for everything involving a size.

    One final note, while this is a valid implementation, it's probably doing a bit more work than it needs to. Rather than counting the number of tokens in a first pass, then retrieving them in a second pass, you can surely do everything you want in a single pass, but I'll leave that as an exercise to you if you're so inclined.