Search code examples
cfopenfwritefreadstrcmp

c - reading and altering string from csv file


So I have a csv file opened in excel and I'm trying to copy everything in that file and write it in a new file with alterations. I have a 4th column filled with a specific food in each row and I want to change it so that it only prints out the first letter of each food in the new file. Like 'Meatballs' should print as 'M'. The problem is I keep getting a segmentation fault and I have no clue what's causing it. I spent hours trying to fix this simple problem with no success. Any help would be appreciated.

    char readLine[lineCount];
    while(fgets(readLine, lineCount, fd)) {
        char* tmp = strdup(readLine);

        if(strcmp(getfield(tmp, 1), "Food") == 0){

            if(strcmp(getfield(tmp, 4), "Meatballs") == 0){
                fprintf(ft, "%c,", 'M');

            }else if(strcmp(getfield(tmp, 4), "Icecream") == 0){
                fprintf(ft, "%c,", 'I');
            }   

        }

        fprintf(ft, "\n");
        free(tmp);
    } 

getfield function:

const char* getfield(char* line, int num){
    const char* tok;

    while(tok = strsep(&line, ","))
    {   
        if(!--num) {
            printf("%s\n", tok);
            return tok;
        }
    }
    return NULL;
}

Solution

  • @Seamus's explanation is exactly correct. What happens when you pass tmp to, e.g. getfield(tmp, 1), getfield() receives a copy of the pointer tmp in line that points to the original address of tmp, but has an address all its own. Then in getfield() you call while(tok = strsep(&line, ",")) which modifies the string pointed to by line writing a nul-character in place of each delimiter and updating line to point one-past that delimiter so the next token can be parsed.

    Since you are not returning line and you operate on a copy of the original pointer in getfield(), the changes to tmp back in the caller are lost after the first call to getfield(), the delimiters have been overwritten by the nul-character and your next call to getfield() fails when the nul-character is encountered instead of a delimiter.

    Slightly different than @Seamus, I would make a copy of the string to operate on in getfield() and return an allocated copy of the token within the block of memory allocated for the copy leaving the original string unchanged. The also places the burden on the caller to free the memory allocated to the token. This would allow obtaining however many random fields within the comma-separated-line you desire. If you are processing the fields in sequence, then you can gain a bit of efficiency by simply keeping track of the number of tokens you have processed.

    Ideally, you would simply tokenize the line reading all tokens into an allocated array of pointers that could be returned to the caller with a single call to your tokenization function.

    After your call to obtain the first-field and have compared it to "Food", all you need to do is get the fourth-field and then simply derefernce the pointer returned in order to obtain the first character. You need not compare each fourth-field against, e.g. "Meatballs" or "Icecream" before obtaining the first-character. That is unnecessary.

    Putting it altogether in a short example that takes the csv filename as the first argument (or reads from stdin if no argument is given), you could do the following:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAXC 1024
    
    /* returns allocated token for field on success, NULL otherwise */
    char *getfield (char *buf, size_t field)
    {
        size_t len = strlen(buf);       /* size of input string */
        char *cpy = malloc (len + 1),   /* allocate for copy */
            *p,                         /* pointer to use with strsep */
            *tok = NULL;                /* token for requested field */
    
        if (!cpy)                       /* validate allocation */
            return NULL;
    
        memcpy (cpy, buf, len + 1);     /* copy buf to cpy */
        p = cpy;                        /* pointer to cpy, preserves cpy address */
    
        while (field-- && (tok = strsep (&p, ","))) {}  /* get field field */
    
        /* copy tok to cpy and return cpy on success or NULL on failure */
        return tok ? memmove (cpy, tok, strlen(tok) + 1) : NULL;
    }
    
    int main (int argc, char **argv) {
    
        char buf[MAXC];
        /* use filename provided as 1st argument (stdin by default) */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
        if (!fp) {  /* validate file open for reading */
            perror ("file open failed");
            return 1;
        }
    
        while (fgets (buf, MAXC, fp)) {
            char *first = getfield (buf, 1);
            char *fourth = getfield (buf, 4);
            if (first && fourth && strcmp (first, "Food") == 0)
                printf ("%c\n", *fourth);
            free (first);
            free (fourth);
        }
    
        if (fp != stdin)   /* close file if not stdin */
            fclose (fp);
    
        return 0;
    }
    

    (note: memmove() required because cpy and tok overlap)

    There are multiple ways to approach this. All are fine as long as the do the job and are reasonably efficient, and free all memory they have allocated. Look things over and let me know if you have further questions.