Search code examples
cbufferstdio

How to buffer data read from a large file without newlines


I am reading from a file with several thousand floats written in plain text in it, in sections that are separated by newlines. The floats themselves are separated by whitespace, and occasionally a semi-colon (to separate every group of 3). A newline character doesn't appear until at the very end, an unknown (but probably tens of thousands) of characters later.

The language I'm using is C

3Dmodel.txt
-----

Obj1 Vertice count=5842;
{
0.499507 -0.003674 0.699311; 0.454010 -0.075165 ... -0.022236 \n (newline)
}

My question is, what is the best way to extract the strings from the file and into memory?

I cannot use fgets() it seems, because the newline is so far out, and because it may end reading in the middle of a float, leaving it incomplete. Reading the entire file into memory seems needlessly expensive, though it wouldn't be terrible if it were the only way, as each file is only 2MB to 10MB large.


Solution

  • it may end reading in the middle of a float ...

    Thats not a problem for fgets, if a float is cutted, fseek to the begin of such float and continue reading from there, example:

    /* data */

    1.23 2.12 3.24 98.88 78.243 3.34 3.4 23.5 54.5
    7.8 9.0
    

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main(void)
    {
        char s[16], *p, *q;
        double d;
        FILE *f;
    
        f = fopen("data", "r");
        if (f == NULL) {
            perror("fopen");
            exit(EXIT_FAILURE);
        }
        while ((p = fgets(s, sizeof s, f)) != NULL) {
            while (1) {
                d = strtod(p, &q);
                if (p == q) break;
                if (*q == '\0') {
                    /* cutted, must adjust */
                    printf("Cutted at <%s>, adjusting ...\n", p);
                    fseek(f, -strlen(p), SEEK_CUR);
                    break;
                }
                printf("%f\n", d);
                p = q;
            }
        }
        fclose(f);
        return 0;
    }
    

    Output:

    1.230000
    2.120000
    3.240000
    98.880000
    78.243000
    Cutted at < 3.>, adjusting ...
    3.340000
    3.400000
    23.500000
    54.500000
    7.800000
    9.000000