Search code examples
carraysoutputstrtok

Splitting a string into tokens and putting tokens into array - strtok


char** token_arr(char* str, int n_tokens)
{
   char**arr = malloc((n_tokens+1)*sizeof(char*));
   char str2[n_tokens + 1];
   strcpy(str2,str);
   int i = 0;
   char *p = strtok (str2, " ");

   while (p != NULL)
   {
       arr[i] = p;
       //printf("%s\n", arr[i]);
       p = strtok (NULL, " ");
       i++;
   }
 return arr;
}

The purpose of token_arr is to get a string and a number of tokens, then put the tokens into an array. The array of tokens is returned.

int main(void) {
  char*str1 = "( 8 + ( 41 - 12 ) )";
  char**expression = token_arr(str1, 9);
  for(int i = 0; i < 9; i++)
    printf("expression[%d] = %c\n", i, *expression2[i]);
 return 0;
}

Output:

expression2[0] = (
expression2[1] = 
expression2[2] = 
expression2[3] = 
expression2[4] = 
expression2[5] = 
expression2[6] = 
expression2[7] = 
expression2[8] =

Why is only the first value being printed? What's wrong with my code?


Solution

  • While I think you have probably got most of the issues sorted based on the comments, let's look at a way to address both the validation/return of expressions and a way to return the number of tokens to protect against an error in tokenization resulting in less than n_tokens being found.

    As you have learned, when you declare str2 local to token_arr, it has automatic storage duration and is only valid within the scope where it is declared. When token_arr returns, the memory holding str2 is released for re-use and any attempt to reference that memory back in main() invokes Undefined Behavior.

    What are your options? (1) use strdup to dynamically allocate storage for each token, copy the token to the new memory allocated, and then assign the starting address for the new block of memory containing the token to arr[i], e.g.

            arr[i] = strdup (p);
    

    or (2) do the same thing manually using strlen, malloc & memcpy, e.g.

            size_t len = strlen(p);
            arr[i] = malloc (len + 1);
            /* validate - here */
            memcpy (arr[i], p, len + 1);
    

    Now each arr[i] points to a block of memory having allocated storage duration which remains valid until free is called on that block -- or the program ends.

    What If Less Than n_tokens Are Found?

    If less than n_tokens are found within token_arr and you attempt to use n_tokens through expressions back in main() you will likely invoke Undefined Behavior again. To ensure you only use the tokens found in token_arr and made available in main() by the assignment to expression -- Pass A Pointer To n_tokens as the second parameter and update it will the value of i before you return arr;, e.g.

    char **token_arr (const char *str, int *n_tokens)
    {
        char **arr = malloc(*n_tokens * sizeof *arr);
        ...
            i++;
        }
        *n_tokens = i;  /* assign i to make tokes assigned available */
    
        return arr;
    }
    

    Now n_tokens back in main() contains only the number of tokens actually found and allocated for and assigned to arr[i] in token_arr.

    Validate Every Allocation

    It is critical that you validate every call to malloc, calloc, realloc, strdup or any other function that allocates memory for you. Allocation can, and does, fail. When it does, it let's you know by returning NULL instead of a pointer containing the beginning address for the new block of memory. Check every allocation.

    Putting it altogether, you could do something like:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char **token_arr (const char *str, int *n_tokens)
    {
        char **arr = malloc(*n_tokens * sizeof *arr);
        char str2 [strlen(str) + 1];
        int i = 0;
    
        if (!arr) { /* validate every allocation */
            perror ("malloc-n_tokens");
            return NULL;
        }
    
        strcpy (str2, str);
    
        char *p = strtok (str2, " ");
    
        while (i < *n_tokens && p != NULL) {    /* check used pointers */
            arr[i] = strdup (p);
            if (!arr[i]) {  /* strdup allocates -> you must validate */
                perror ("strdup-arr[i]");
                if (i)          /* if tokens stored, break an return */
                    break;
                else {          /* if no tokes stored, free pointers */
                    free (arr);
                    return NULL;
                }
            }
            p = strtok (NULL, " ");
            i++;
        }
        *n_tokens = i;  /* assign i to make tokes assigned available */
    
        return arr;
    }
    
    int main (void) {
    
        char *str1 = "( 8 + ( 41 - 12 ) )";
        int n_tokens = 9;
        char **expression = token_arr (str1, &n_tokens);
    
        if (expression) {       /* validate token_arr succeeded */
            for (int i = 0; i < n_tokens; i++) { /* n_tokens times */
                printf ("expression[%d] = %s\n", i, expression[i]);
                free (expression[i]);   /* free mem allocated by strdup */
            }
            free (expression);
        }
    
        return 0;
    }
    

    (note: likewise check the return of token_arr before making use of the return)

    Example Use/Output

    $ ./bin/token_arr
    expression[0] = (
    expression[1] = 8
    expression[2] = +
    expression[3] = (
    expression[4] = 41
    expression[5] = -
    expression[6] = 12
    expression[7] = )
    expression[8] = )
    

    Memory Use/Error Check

    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

    $ valgrind ./bin/token_arr
    ==8420== Memcheck, a memory error detector
    ==8420== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==8420== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==8420== Command: ./bin/token_arr
    ==8420==
    expression[0] = (
    expression[1] = 8
    expression[2] = +
    expression[3] = (
    expression[4] = 41
    expression[5] = -
    expression[6] = 12
    expression[7] = )
    expression[8] = )
    ==8420==
    ==8420== HEAP SUMMARY:
    ==8420==     in use at exit: 0 bytes in 0 blocks
    ==8420==   total heap usage: 10 allocs, 10 frees, 92 bytes allocated
    ==8420==
    ==8420== All heap blocks were freed -- no leaks are possible
    ==8420==
    ==8420== For counts of detected and suppressed errors, rerun with: -v
    ==8420== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    

    Always confirm that you have freed all memory you have allocated and that there are no memory errors.

    Look things over and let me know if you have further questions.