Search code examples
cfile-iobinary-search-treeread-write

How to read variable separated by punctuations in a text file in C


I was wondering how to read data from a text file that has its data separated by a comma.

For example line 1 of the text file says: (Integer,Name Surname, IntegerArray)

This is 1 line: 123456789,Jonh Brown,123456434-4325234-235234-42345234

typedef struct BST
{
    long long ID;
    char *name;
    char *surname;
    long long *friendsID;
    struct BST *left;
    struct BST *right;
}BST;

reading from file:

do
{
    c = fscanf(fp,"%I64d%*c%s%s",&ID,name,surname);
    if (c != EOF)
        root=insertNewUser(root,ID,name,surname);
} while (c != EOF);

newNodeTemp->ID = ID;
newNodeTemp->name = (char*)calloc(strlen(name),sizeof(char));
newNodeTemp->surname = (char*)calloc(strlen(surname),sizeof(char));
strcpy(newNodeTemp->name,name);
strcpy(newNodeTemp->surname,surname);

but I do not know how to get it as array into BST->friends without '-'(hyphen).

this part: 123456434-4325234-235234-42345234

I defined the friends of array as a pointer. Because we don't know its size. I will use dynamic memory allocation...


Solution

  • If I understand your question, that you have a CSV with user info and the friends of that user, where the friends are encoded as a hyphen separated list of friend-IDs as the third field in the line, then you can use a combination of the re-entrant version of strtok (named strtok_r) to separate the comma separated fields, and than use calls to strtok within your outer loop to separate the hyphen separated values.

    Note, strtok_r requires an additional "save pointer" as its third argument so that you can resume calls to that instance of strtok_r after having made intermediate calls to a difference instance of strtok or strtok_r for alternative separation purposes.

    Given your line of:

    "123456789,Jonh Brown,123456434-4325234-235234-42345234"
    

    where 123456789 is the ID, Jonh Brown is the name, and 123456434-4325234-235234-42345234 is a list of friend IDs, you could parse the line and individual friends, just by keeping a field count and calling a separate instance of strtok within your tokinzation loop to separate friends on hyphens.

    A short example would be:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define FRIENDS 3
    
    int main (void) {
    
        char line[] = "123456789,Jonh Brown,123456434-4325234-235234-42345234",
            *delims = ",\n",    /* strtok_r delimiters for tokens */
            *p      = line,     /* pointer to line */
            *sp     = p;        /* save pointer for reentrant strtok_r */
        int count = 0;
    
        /* must use reentrant version of strtok to nest strtok use for friends */
        p = strtok_r (line, delims, &sp);  /* 1st call uses name of buffer */
        count++;
    
        while (p) {             /* outputer tokenization loop */
            printf ("token: '%s'\n", p);
            if (count == FRIENDS) {
                char *pf = calloc (strlen (p) + 1, 1),  /* pointer to friends   */
                    *delim2 = "-\n",                    /* delims for friends   */
                    *f;                                 /* pointer preserves pf */
                if (!pf) {
                    perror ("malloc-pf");
                    exit (EXIT_FAILURE);
                }
                strcpy (pf, p);                 /* copy friends token to pf */
                f = pf;                         /* set f, to pf, to preserve pf */
                f = strtok (f, delim2);         /* regular strtok OK for friends */
                if (f)
                    printf ("friends:\n");
                while (f) {     /* friends tokenization loop */
                    printf ("    %s\n", f);
                    f = strtok (NULL, delim2);  /* subsequent calls use NULL */
                }
                free (pf);      /* free allocated memory at preserved address */
                count = 0;      /* reset count */
            }
            p = strtok_r (NULL, delims, &sp);  /* subsequent calls use NULL */
            count++;
        }
    
        return 0;
    }
    

    (note: since strtok modifies the original string and advances the pointer it uses, you must make a copy of the friends token, and preserve a pointer to the starting address of the allocated token for friends (pf) so that it can be freed after you are done with separating friends)

    (also note: if your system provides strdup, you can replace the two calloc (strlen (p) + 1, 1) and strcpy (pf, p); calls with a simple call to char *pf = strdup(p);. But note, since strdup allocates dynamically, you should still validate if (!pf) after the call)

    Example Use/Output

    $ ./bin/strtok_csv
    token: '123456789'
    token: 'Jonh Brown'
    token: '123456434-4325234-235234-42345234'
    friends:
        123456434
        4325234
        235234
        42345234
    

    Look things over and let me know if you have further questions.