Search code examples
cbinaryfilesfreadfseek

How can I read and obtain separated data from a file using 'fread' in C?


I've written in a file (using 'fwrite()') the following:

TUS�ABQ���������������(A����������(A��B������(A��B���A��(A��B���A������B���A������0����A������0�ABQ�������0�ABQ�����LAS����������������A�����������A��&B�������A��&B��B���A��&B��B������&B��
B����153���B����153�LAS�����153�LAS�����LAX���������������:A����������:AUUB������:AUUB��B��:
AUUB��B����UUB��B����������B��������LAX���������LAX�����MDW���������������A����������A��(�������A��(����A��A��(����A������(����A����A�89���A����A�89MDW�����A�89MDW�����OAK���������
����������������������@�����������@�����������@�����������@�������������������������OAK���������OAK�����SAN���������������LA����������LA��P@������LA��P@��@A��LA��P@��@A������P@��@A����������@A��������SAN���������SAN�����TPA�ABQ����������������B�����������B��@�����...(continues)

which is translated to this:

TUSLWD2.103.47.775.1904.06.40.03AMBRFD4.63.228.935.0043.09.113.0ASDGHU5.226.47.78.3.26...(The same structure)

and the hexdump of that would be:

00000000  54 55 53 00 41 42 51 00  00 00 00 00 00 00 00 00  |TUS.ABQ.........|
00000010  00 00 00 00 00 00 28 41  00 00 0e 42 00 00 f8 41  |......(A...B...A|
00000020  00 00 00 00 4c 41 53 00  00 00 00 00 00 00 00 00  |....LAS.........|
00000030  00 00 00 00 00 00 88 41  00 00 26 42 9a 99 11 42  |.......A..&B...B|
(Continues...)

the structure is, always 2 words of 3 characters each one (i.e. TUS and LWD) followed by 7 floats, and then it repeats again on a on until end of file.

The key thing is: I just want to read every field separated like 'TUS', 'LWD', '2.10', '3.4', '7.77'...

And I can only use 'fread()' to achieve that! For now, I'm trying this:

aux2 = 0;
fseek(fp, SEEK_SET, 0);
fileSize = 0;
while (!feof(fp) && aux<=2) {
    fread(buffer, sizeof(char)*4, 1, fp);
    printf("%s", buffer);
    fread(buffer, sizeof(char)*4, 1, fp);
    printf("%s", buffer);
    for(i=0; i<7; i++){
        fread(&delay, sizeof(float), 1, fp);
        printf("%f", delay);
    }
    printf("\n");
    aux++;
    fseek(fp,sizeof(char)*7+sizeof(float)*7,SEEK_SET);
    aux2+=36;
}

And I get this result:

TUSABQ0.0000000.0000000.00000010.5000000.0000000.00000010.500000
AB0.0000000.000000-10384675421112248092159136000638976.0000000.0000000.000000-10384675421112248092159136000638976.0000000.000000
AB0.0000000.000000-10384675421112248092159136000638976.0000000.0000000.000000-10384675421112248092159136000638976.0000000.000000

But it does not works correctly...

*Note: forget the arguments of the last 'fseek()', cos I've been trying too many meaningless things! To write the words (i.e. TUS) into the file, I use this:

fwrite(x->data->key, 4, sizeof(char), fp);

and to write the floats, this:

for (i = 0; i < 7; i++) {
    fwrite(&current->data->retrasos[i], sizeof(float), sizeof(float), fp);
}

Solution

  • I'd recommend using a structure to hold each data unit:

    typedef struct {
        float  value[7];
        char   word1[5];  /* 4 + '\0' */
        char   word2[5];  /* 4 + '\0' */
    } unit;
    

    To make the file format portable, you need a function that packs and unpacks the above structure to/from a 36-byte array. On Intel and AMD architectures, float corresponds to IEEE-754-2008 binary32 format in little-endian byte order. For example,

    #define STORAGE_UNIT (4+4+7*4)
    
    #if defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)
    
    size_t unit_pack(char *target, const size_t target_len, const unit *source)
    {
        size_t i;
    
        if (!target || target_len < STORAGE_UNIT || !source) {
            errno = EINVAL;
            return 0;
        }
    
        memcpy(target + 0, source->word1, 4);
        memcpy(target + 4, source->word2, 4);
    
        for (i = 0; i < 7; i++)
            memcpy(target + 8 + 4*i, &(source->value[i]), 4);
    
        return STORAGE_UNIT;
    }
    
    size_t unit_unpack(unit *target, const char *source, const size_t source_len)
    {
        size_t i;
    
        if (!target || !source || source_len < STORAGE_UNIT) {
            errno = EINVAL;
            return 0;
        }
    
        memcpy(target->word1, source, 4);
        target->word1[4] = '\0';
    
        memcpy(target->word2, source + 4, 4);
        target->word2[4] = '\0';
    
        for (i = 0; i < 7; i++)
            memcpy(&(target->value[i]), source + 8 + i*4, 4);
    
        return STORAGE_UNIT;
    }
    
    #else
    #error Unsupported architecture!
    #endif
    

    The above only works on Intel and AMD machines, but it is certainly easy to extend to other architectures if necessary. (Almost all machines currently use IEEE 754-2008 binary32 for float, only the byte order varies. Those that do not, typically have C extensions that do the conversion to/from their internal formats.)

    Using the above, you can -- should! must! -- document your file format, for example as follows:

    Words are 4 bytes encoded in UTF-8
    Floats are IEEE 754-2008 binary32 values in little-endian byte order
    
    A file contains one or more units. Each unit comprises of
    
        Name    Description
        word1   First word
        word2   Second word
        value0  First float
        value1  Second float
        value2  Third float
        value3  Fourth float
        value4  Fifth float
        value5  Sixth float
        value6  Second float
    
    There is no padding.
    

    To write an unit, use a char array of size STORAGE_UNIT as a cache, and write that. So, if you have unit *one, you can write it to FILE *out using

        char  buffer[STORAGE_UNIT];
    
        if (unit_pack(buffer, sizeof buffer, one)) {
            /* Error! Abort program! */
        }
        if (fwrite(buffer, STORAGE_UNIT, 1, out) != 1) {
            /* Write error! Abort program! */
        }
    

    Correspondingly, reading from FILE *in would be

        char buffer[STORAGE_UNIT];
    
        if (fread(buffer, STORAGE_UNIT, 1, in) != 1) {
            /* End of file, or read error.
               Check feof(in) or/and ferror(in). */
        }
        if (unit_unpack(one, buffer, STORAGE_UNIT)) {
            /* Error! Abort program! */
        }
    

    If one is an array of units, and you are writing or reading one[k], use &(one[k]) (or equivalently one + k) instead of one.