Search code examples
cparsingcharstrtok

Error when using strtok to split a long string into shorter strings


I have a function in which I'm trying to split a string but somehow it's stopping when spaces are read.

input.csv: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

output.txt: 18820218,Northern,(null),(null),(null),(null),(null),(null),(null)

typedef struct
{
    long int date;
    char *h_team;
    char *a_team;
    int home_score;
    int away_score;
    char *reason;
    char *city;
    char *country;
    char *neutral_field;

}Data;


void open_output(char *string, FILE **output)
{       
    if((*output=fopen(string, "w")) == NULL)
    {
        printf("%s not found\n", string);
            exit(1);
    }
}

void alloc_Data(Data *d, int size)
{
    d->line1 = (char*)malloc(50*sizeof(char)); 
    d->h_team = (char*)malloc(30*sizeof(char)); 
    d->a_team = (char*)malloc(30*sizeof(char)); 
    d->reason = (char*)malloc(30*sizeof(char)); 
    d->city = (char*)malloc(30*sizeof(char)); 
    d->country = (char*)malloc(30*sizeof(char)); 
    d->neutral_field = (char*)malloc(9*sizeof(char)); 
}

void store(Data *d, FILE *output)
{
    char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                    "Belfast,Ireland,FALSE";
    char *char_date = malloc(10*sizeof(char));
    char *char_hscore = malloc(20*sizeof(char));
    char *char_ascore = malloc(3*sizeof(char));

    char *token;

    token = strtok(string, ",");    
    char_date = token;

    token = strtok(NULL, ",");
    d->h_team = token;  

    token = strtok(NULL, ",");
    d->a_team = token;  

    token = strtok(NULL, ",");
    char_hscore = token;

    token = strtok(NULL, ",");
    char_ascore = token;    

    token = strtok(NULL, ",");
    d->reason = token;  

    token = strtok(NULL, ",");
    d->city = token;    

    token = strtok(NULL, ",");
    d->country = token; 

    token = strtok(NULL, ",");
    d->neutral_field = token;   

    d->date = atoi(char_date);
    d->home_score = atoi(char_hscore);
    d->away_score = atoi(char_ascore);

    fprintf(output, "%li,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    free(string);
    free(char_date);
    free(char_hscore);
    free(char_ascore);
}

int main(int argc, char *argv[])
{
    FILE *output;
    char *string = "saida.txt";

    open_output(string, &output);   

    Data *d;
    d = (Data*)malloc(sizeof(Data)); 
    alloc_Data(d);

    store(d, output);

    free(d);

    return 0;
}

Solution

  • Ana, I have watched your questions change over the past several iterations, and it is clear you know what pieces you need to put together, but you are somewhat making it more difficult on yourself than it needs to be by how you are trying to fit them together.

    The purpose for dynamically allocating your structures or data are (1) handle larger amounts of data than will fit within your program stack (not an issue here), (2) to allow you to grow or shrink the amount of storage you are using as your data needs fluctuate over the course of your program (also not an issue here), or (3) to allow you to tailor your storage needs based on data in use in your program. This last part seems to be what you are attempting, but by allocating a fixed size for your character arrays -- you completely lose the benefit of tailoring your allocation to the size of your data.

    In order to allocate storage for each of the strings contained in your data, you need to get the length of each string, and then allocate length + 1 characters for storage (the +1 for the nul-terminating character). While you can use malloc and then strcpy to accomplish the allocation and copy to the new block of memory, if you have strdup, that can do both for you in one function call.

    The quandary you are faced with is "Where do I store the data before I get the lengths and allocate an copy?" You can handle this in a number of ways. You can declare a jumble of different variables, and parse the data into individual variables to begin with (somewhat messy), you can allocate one struct with the fixed values to initially store the values (a good option, but calling malloc for 30 or 50 chars doesn't make a lot of sense when a fixed array will do), or you can declare a separate temporary struct with fixed array sizes to use (that way to collect the jumble of separate variables into a struct that can then easily be passed to your allocate function) Consider each, and use the one that works best for you.

    Your function return types do not make a whole lot of sense as they are. You need to choose a meaningful return type that will allow the function to indicate whether it succeeded or failed, and then return a value (or pointer to a value) that provides useful information to the remainder of your program. Gauging success/failure of a function is particularly important with functions that allocate memory or handle input or output.

    In addition to the return type you choose, you need to think through the parameters you are passing to each function. You need to think about what variables need to be available where in your function. Take your FILE* parameter. You never user the file outside of your store() function - so why do you have it declared in main() which causes you to have to worry about returning the open stream through a pointer -- that you don't use.

    With that in mind we can look at putting the pieces of your program together in a slightly manner.

    First, do not use magic numbers sprinkled throughout your code. (e.g. 9, 10, 20, 30, 50, etc..) Instead,

    #define MAXN  9     /* if you need constants, define one (or more) */
    #define MAXC 30
    #define MAXL 50
    

    (or you can use an enum for the same purpose)

    For the purpose of the example, you can use a struct that you will dynamically allocate for efficient storage of your data, and a temporary struct to aid in parsing values from your line of data. For example:

    typedef struct {    /* struct to hold dynamically allocated data */
        long date;      /* sized to exact number of chars required. */
        int home_score,
            away_score;
        char *h_team,
            *a_team,
            *reason,
            *city,
            *country,
            *neutral_field;
    } data_t;
    
    typedef struct {    /* temp struct to parse data from line */
        long date;      /* sized to hold largest anticipated data */
        int home_score,
            away_score;
        char h_team[MAXC],
            a_team[MAXC],
            reason[MAXC],
            city[MAXC],
            country[MAXC],
            neutral_field[MAXN];
    } data_tmp_t;
    

    Next, the entire purpose of your open_output function is to open a file for writing. It should return the open file stream on success or NULL otherwise, e.g.

    /* pass filename to open, returns open file stream pointer on
     * success, NULL otherwise.
     */
    FILE *open_output (const char *string)
    {       
        FILE *output = NULL;
    
        if ((output = fopen (string, "w")) == NULL)
            fprintf (stderr, "file open failed. '%s'.\n", string);
    
        return output;
    }
    

    Your alloc_data function is allocating a data struct and filling its values. It should return a pointer to the fully allocated and filled struct on success, or NULL on failure, e.g.

    /* pass temporary struct containing data, dynamic struct allocated,
     * each member allocated to hold exact number of chars (+ terminating
     * character). pointer to allocated struct returned on success,
     * NULL otherwise.
     */
    data_t *alloc_data (data_tmp_t *tmp)
    {
        data_t *d = malloc (sizeof *d); /* allocate structure */
    
        if (d == NULL)
            return NULL;
    
        d->date = tmp->date;
    
        /* allocate each string member with strdup. if not available,
         * simply use malloc (strlen(str) + 1), and then strcpy.
         */
        if ((d->h_team = strdup (tmp->h_team)) == NULL)
            return NULL;
        if ((d->a_team = strdup (tmp->a_team)) == NULL)
            return NULL;
    
        d->home_score = tmp->home_score;
        d->away_score = tmp->away_score;
    
        if ((d->reason = strdup (tmp->reason)) == NULL)
            return NULL;
        if ((d->city = strdup (tmp->city)) == NULL)
            return NULL;
        if ((d->country = strdup (tmp->country)) == NULL)
            return NULL;
        if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL)
            return NULL;
    
        return d;   /* return pointer to allocated struct */
    }
    

    Any time you are allocating multiple values that are nested within a struct (or nested structs), get in the habit of writing a free_data function to free the memory you allocate in alloc_data. It is far better to write one free function to properly handle a complex structure you have allocated compared to sprinkling individual free calls around your code. There is no return to check when freeing a variable, so you can use a void function here:

    /* frees each allocated member of d, and then d itself */
    void free_data (data_t *d)
    {
        free (d->h_team);
        free (d->a_team);
        free (d->reason);
        free (d->city);
        free (d->country);
        free (d->neutral_field);
    
        free (d);
    }
    

    Your store() function is where most of the decisions and validation checks take place. The purpose of your code is to parse and then store a string in a filename. That should get you thinking about what parameters are required. The remainder of the file handling can all be internal to store() since the FILE is not used further in the calling function. Now, depending on how many writes you are doing, it may make perfect sense to declare and open the FILE once in main() and then pass an open (and validated) FILE* parameter, which would then require only a single fopen call and a final close in main(). For purposes here, all will be handled in store so you can check for any stream error after each write by checking the return of fclose.

    Since you are allocating and storing a struct which may be needed for further use in the calling function, choosing to return a pointer to the caller (or NULL on failure) makes for a good choice of return type for store(). You could do something like:

    /* parses data in string into separate values and stores data in string
     * to filename (note: use mode "a" to append instead of "w" which
     * truncates). returns pointer to fully-allocated struct on success,
     * NULL otherwise.
     */
    data_t *store (const char *string, const char *filename)
    {
        data_tmp_t tmp = { .date = 0 };
        data_t *d = NULL;
        FILE *output = open_output (filename);  /* no need to pass in */
                                                /* not used later in main */
        if (output == NULL) {   /* validate file open for writing */
            return NULL;
        }
    
        /* parse csv values with sscanf - avoids later need to convert values
         * validate all values successfully converted.
         */
        if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," 
                            "%29[^,],%8[^\n]",
                            &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score,
                            &tmp.away_score, tmp.reason, tmp.city, tmp.country,
                            tmp.neutral_field) != 9) {
            fprintf (stderr, "error: failed to parse string.\n");
            return NULL;
        }
    
        d = alloc_data (&tmp);  /* allocate d and deep-copy tmp to d */
        if (d == NULL) {        /* validate allocation/copy succeeded */
            perror ("malloc-alloc_data");
            return NULL;
        }
    
        /* output values to file */
        fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
                d->a_team, d->home_score, d->away_score, d->reason, d->city, 
                d->country, d->neutral_field );
    
        if (fclose (output) == EOF) /* always validate close-after-write */
            perror ("stream error-output");
    
        return d;   /* return fully allocated/populated struct */
    }
    

    Your main() can then handle nothing more than your string that needs to be parsed, the filename to write the data to, and a pointer to the fully allocated struct resulting from the parse so it is available for further use. (it also takes the file to write to as the 1st argument to the program -- or it will write to "saida.txt" by default if no argument is provided, e.g.

    int main (int argc, char *argv[])
    {
        char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                        "Belfast,Ireland,FALSE";
        /* filename set to 1st argument (or "saida.txt" by default) */
        char *filename = argc > 1 ? argv[1] : "saida.txt";
        data_t *d = NULL;
    
        d = store (string, filename);   /* store string in filename */
    
        if (d == NULL) {    /* validate struct returned */
            fprintf (stderr, "error: failed to store string.\n");
            return 1;
        }
    
        /* output struct values as confirmation of what was stored in file */
        printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
                d->a_team, d->home_score, d->away_score, d->reason, d->city, 
                d->country, d->neutral_field );
    
        free_data (d);  /* free all memory when done */
    
        return 0;
    }
    

    While not mandated by the C Standard, the "standard" coding style for C avoids the use of camelCase or MixedCase variable names in favor of all lower-case while reserving upper-case names for use with macros and constants. It is a matter of style -- so it is completely up to you, but failing to follow it can lead to the wrong first impression in some circles.

    Putting it altogether, you could do something like the following:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAXN  9     /* if you need constants, define one (or more) */
    #define MAXC 30
    #define MAXL 50
    
    typedef struct {    /* struct to hold dynamically allocated data */
        long date;      /* sized to exact number of chars required. */
        int home_score,
            away_score;
        char *h_team,
            *a_team,
            *reason,
            *city,
            *country,
            *neutral_field;
    } data_t;
    
    typedef struct {    /* temp struct to parse data from line */
        long date;      /* sized to hold largest anticipated data */
        int home_score,
            away_score;
        char h_team[MAXC],
            a_team[MAXC],
            reason[MAXC],
            city[MAXC],
            country[MAXC],
            neutral_field[MAXN];
    } data_tmp_t;
    
    /* pass filename to open, returns open file stream pointer on
     * success, NULL otherwise.
     */
    FILE *open_output (const char *string)
    {       
        FILE *output = NULL;
    
        if ((output = fopen (string, "w")) == NULL)
            fprintf (stderr, "file open failed. '%s'.\n", string);
    
        return output;
    }
    
    /* pass temporary struct containing data, dynamic struct allocated,
     * each member allocated to hold exact number of chars (+ terminating
     * character). pointer to allocated struct returned on success,
     * NULL otherwise.
     */
    data_t *alloc_data (data_tmp_t *tmp)
    {
        data_t *d = malloc (sizeof *d); /* allocate structure */
    
        if (d == NULL)
            return NULL;
    
        d->date = tmp->date;
    
        /* allocate each string member with strdup. if not available,
         * simply use malloc (strlen(str) + 1), and then strcpy.
         */
        if ((d->h_team = strdup (tmp->h_team)) == NULL)
            return NULL;
        if ((d->a_team = strdup (tmp->a_team)) == NULL)
            return NULL;
    
        d->home_score = tmp->home_score;
        d->away_score = tmp->away_score;
    
        if ((d->reason = strdup (tmp->reason)) == NULL)
            return NULL;
        if ((d->city = strdup (tmp->city)) == NULL)
            return NULL;
        if ((d->country = strdup (tmp->country)) == NULL)
            return NULL;
        if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL)
            return NULL;
    
        return d;   /* return pointer to allocated struct */
    }
    
    /* frees each allocated member of d, and then d itself */
    void free_data (data_t *d)
    {
        free (d->h_team);
        free (d->a_team);
        free (d->reason);
        free (d->city);
        free (d->country);
        free (d->neutral_field);
    
        free (d);
    }
    
    /* parses data in string into separate values and stores data in string
     * to filename (note: use mode "a" to append instead of "w" which
     * truncates). returns pointer to fully-allocated struct on success,
     * NULL otherwise.
     */
    data_t *store (const char *string, const char *filename)
    {
        data_tmp_t tmp = { .date = 0 };
        data_t *d = NULL;
        FILE *output = open_output (filename);  /* no need to pass in */
                                                /* not used later in main */
        if (output == NULL) {   /* validate file open for writing */
            return NULL;
        }
    
        /* parse csv values with sscanf - avoids later need to convert values
         * validate all values successfully converted.
         */
        if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," 
                            "%29[^,],%8[^\n]",
                            &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score,
                            &tmp.away_score, tmp.reason, tmp.city, tmp.country,
                            tmp.neutral_field) != 9) {
            fprintf (stderr, "error: failed to parse string.\n");
            return NULL;
        }
    
        d = alloc_data (&tmp);  /* allocate d and deep-copy tmp to d */
        if (d == NULL) {        /* validate allocation/copy succeeded */
            perror ("malloc-alloc_data");
            return NULL;
        }
    
        /* output values to file */
        fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
                d->a_team, d->home_score, d->away_score, d->reason, d->city, 
                d->country, d->neutral_field );
    
        if (fclose (output) == EOF) /* always validate close-after-write */
            perror ("stream error-output");
    
        return d;   /* return fully allocated/populated struct */
    }
    
    int main (int argc, char *argv[])
    {
        char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                        "Belfast,Ireland,FALSE";
        /* filename set to 1st argument (or "saida.txt" by default) */
        char *filename = argc > 1 ? argv[1] : "saida.txt";
        data_t *d = NULL;
    
        d = store (string, filename);   /* store string in filename */
    
        if (d == NULL) {    /* validate struct returned */
            fprintf (stderr, "error: failed to store string.\n");
            return 1;
        }
    
        /* output struct values as confirmation of what was stored in file */
        printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
                d->a_team, d->home_score, d->away_score, d->reason, d->city, 
                d->country, d->neutral_field );
    
        free_data (d);  /* free all memory when done */
    
        return 0;
    }
    

    Example Use/Output

    $ ./bin/store_teams dat/saida.txt
    stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE
    

    Verify Output File

    $ cat dat/saida.txt
    18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE
    

    Memory Use/Error Check

    There is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?

    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

    $ valgrind ./bin/store_teams dat/saida.txt
    ==16038== Memcheck, a memory error detector
    ==16038== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==16038== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==16038== Command: ./bin/store_teams dat/saida.txt
    ==16038==
    stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE
    ==16038==
    ==16038== HEAP SUMMARY:
    ==16038==     in use at exit: 0 bytes in 0 blocks
    ==16038==   total heap usage: 8 allocs, 8 frees, 672 bytes allocated
    ==16038==
    ==16038== All heap blocks were freed -- no leaks are possible
    ==16038==
    ==16038== For counts of detected and suppressed errors, rerun with: -v
    ==16038== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    

    Always confirm that you have freed all memory you have allocated and that there are no memory errors.

    Hopefully this helps you understand how better to approach putting the puzzle pieces together in a less jumbled way and how to focus on what parameters are needed by each function, and how to think about choosing a meaningful type return for each of your functions. Look things over and let me know if you have further questions.