Search code examples
carraysstringsplitstrtok

Extract numerical values from a string and average them


I have a .txt file that contains data in this format:

xxxx: 0.9467,  
yyyy: 0.9489,  
zzzz: 0.78973,  
hhhh: 0.8874,  
yyyy: 0.64351,  
xxxx: 0.8743,

and so on...

Let's say that my C program receives, as input, the string yyyy. The program should, simply, return all the instances of yyyy in the .txt file and the average of all their numerical values.

int main() {
    FILE *filePTR;
    char fileRow[100000];

    if (fopen_s(&filePTR, "file.txt", "r") == 0) {
        while (fgets(fileRow, sizeof fileRow, filePTR) != NULL) {
            if (strstr(fileRow, "yyyy") != NULL) { // Input parameter
                printf("%s", fileRow);
            }
        }
        fclose(filePTR);
        printf("\nEnd of the file.\n");
    } else {
        printf("ERROR! Impossible to read the file.");
    }
    return 0;
}

This is my code right now. I don't know how to:

  1. Isolate the numerical values
  2. actually convert them to double type
  3. average them

I read something about the strtok function (just to start), but I would need some help...


Solution

  • You have started off on the right track and should be commended for using fgets() to read a complete line from the file on each iteration, but your choice of strstr does not ensure the prefix you are looking for is found at the beginning of the line.

    Further, you want to avoid hardcoding your search string as well as the file to open. main() takes arguments through argc and argv that let you pass information into your program on startup. See: C11 Standard - §5.1.2.2.1 Program startup(p1). Using the parameters eliminates your need to hardcode values by letting you pass the filename to open and the prefix to search for as arguments to your program. (which also eliminates the need to recompile your code simply to read from another filename or search for another string)

    For example, instead of hardcoding values, you can use the parameters to main() to open any file and search for any prefix simply using something similar to:

    #include <stdio.h>
    #include <string.h>
    
    #define MAXC 1024   /* if you need a constant, #define one (or more) */
    
    int main (int argc, char **argv) {
    
        char buf[MAXC] = "", *str = NULL;   /* buffer for line and ptr to search str */
        size_t n = 0, len = 0;              /* counter and search string length */
        double sum = 0;                     /* sum of matching lines */
        FILE *fp = NULL;                    /* file pointer */
    
        if (argc < 3) { /* validate 2 arguments given - filename, search_string */ 
            fprintf (stderr, "error: insufficient number of arguments\n"
                    "usage: %s filename search_string\n", argv[0]);
            return 1;
        }
    
        if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
            perror ("fopen-filename");
            return 1;
        }
        str = argv[2];                      /* set pointer to search string */
        len = strlen (str);                 /* get length of search string */
        ...
    

    At this point in your program, you have opened the file passed as the first argument and have validated that it is open for reading through the file-stream pointer fp. You have passed in the prefix to search for as the second argument, assigned it to the pointer str and have obtained the length of the prefix and have stored in in len.

    Next you want to read each line from your file into buf, but instead of attempting to match the prefix with strstr(), you can use strncmp() with len to compare the beginning of the line read from your file. If the prefix is found, you can then use sscanf to parse the double value from the file and add it to sum and increment the number of values stored in n, e.g.

        while (fgets (buf, MAXC, fp)) {             /* read each line into buf */
            if (strncmp (buf, str, len) == 0) {     /* if prefix matches */
                double tmp;                         /* temporary double for parse */
                /* parse with scanf, discarding prefix with assignment suppression */
                if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
                    sum += tmp;             /* add value to sum */
                    n++;                    /* increment count of values */
                }
            }
        }
    

    (note: above the assignment suppression operator for sscanf(), '*' allows you to read and discard the prefix and ':' without having to store the prefix in a second string)

    All that remains is checking if values are contained in sum by checking your count n and if so, output the average for the prefix. Or, if n == 0 the prefix was not found in the file, e.g.:

        if (n)  /* if values found, output average */
            printf ("prefix '%s' avg: %.4f\n", str, sum / n);
        else    /* output not found */
            printf ("prefix '%s' -- not found in file.\n", str);
    }
    

    That is basically all you need. With it, you can read from any file you like and search for any prefix simply passing the filename and prefix as the first two arguments to your program. The complete example would be:

    #include <stdio.h>
    #include <string.h>
    
    #define MAXC 1024   /* if you need a constant, #define one (or more) */
    
    int main (int argc, char **argv) {
    
        char buf[MAXC] = "", *str = NULL;   /* buffer for line and ptr to search str */
        size_t n = 0, len = 0;              /* counter and search string length */
        double sum = 0;                     /* sum of matching lines */
        FILE *fp = NULL;                    /* file pointer */
    
        if (argc < 3) { /* validate 2 arguments given - filename, search_string */ 
            fprintf (stderr, "error: insufficient number of arguments\n"
                    "usage: %s filename search_string\n", argv[0]);
            return 1;
        }
    
        if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
            perror ("fopen-filename");
            return 1;
        }
        str = argv[2];                      /* set pointer to search string */
        len = strlen (str);                 /* get length of search string */
    
        while (fgets (buf, MAXC, fp)) {             /* read each line into buf */
            if (strncmp (buf, str, len) == 0) {     /* if prefix matches */
                double tmp;                         /* temporary double for parse */
                /* parse with scanf, discarding prefix with assignment suppression */
                if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
                    sum += tmp;             /* add value to sum */
                    n++;                    /* increment count of values */
                }
            }
        }
    
        if (n)  /* if values found, output average */
            printf ("prefix '%s' avg: %.4f\n", str, sum / n);
        else    /* output not found */
            printf ("prefix '%s' -- not found in file.\n", str);
    }
    

    Example Use/Output

    Using your data file stored in dat/prefixdouble.txt, you can search for each prefix in the file and obtain the average, e.g.

    $ ./bin/prefixaverage dat/prefixdouble.txt hhhh
    prefix 'hhhh' avg: 0.8874
    
    $ ./bin/prefixaverage dat/prefixdouble.txt xxxx
    prefix 'xxxx' avg: 0.9105
    
    $ ./bin/prefixaverage dat/prefixdouble.txt yyyy
    prefix 'yyyy' avg: 0.7962
    
    $ ./bin/prefixaverage dat/prefixdouble.txt zzzz
    prefix 'zzzz' avg: 0.7897
    
    $ ./bin/prefixaverage dat/prefixdouble.txt foo
    prefix 'foo' -- not found in file.
    

    Much easier than having to recompile each time you want to search for another prefix. Look things over and let me know if you have further questions.