Search code examples
cfilestructbinaryfilesfread

Why using fread after putw expands the file in C?


I was trying to read some data from a file using fread(), and I realized that my file keeps growing and growing. But since I was reading from a file, the behavior was not reasonable for me. So I wrote this code and found that if I use putw() to write data to a file, then try to read from that file(before closing and reopening the file), fread expands the file to be able to read from it.

Operating System: Windows 8.1
Compiler: MinGW gcc

The code:

typedef struct {
    int a;
    int b;
} A;

int main() {
    FILE* f = fopen("file", "wb");
    A a;
    a.a = 2;
    a.b = 3;
    putw(1, f);
    fwrite(&a, sizeof(A), 1, f);
    fclose(f); // To make sure that wb mode and fwrite are not responsible
    f = fopen("file", "rb+");
    printf("initial position: %ld\n", ftell(f));
    putw(1, f);
    printf("position after putw: %ld\n", ftell(f));
    printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
    printf("position after 1st fread: %ld\n", ftell(f));
    printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
    printf("position after 2nd fread: %ld\n", ftell(f));
    fclose(f);
    remove("file");
    return 0;
}

RESULT:

initial position: 0
position after putw: 4
fread result: 1
position after 1st fread: 12
fread result: 1
position after 2nd fread: 20

Solution

  • The Problems

    There are a few issues in the code that can lead to undefined behavior:

    1. mixing wide- & byte-oriented functions,
    2. using the contents with position after a character that was written to a wide oriented stream (causing potential framing errors), and
    3. calling input functions after output functions without an intervening fflush.

    Issue 2 is tricky to phrase succinctly; the C standard section quoted below should make it clearer.

    The behavior of functions as related to orientation is defined in C17 (draft) §§ 7.21.2 4,5:

    4 Each stream has an orientation. After a stream is associated with an external file, but before any operations are performed on it, the stream is without orientation. Once a wide character input/output function has been applied to a stream without orientation, the stream becomes a wide-oriented stream. Similarly, once a byte input/output function has been applied to a stream without orientation, the stream becomes a byte-oriented stream. Only a call to the freopen function or the fwide[*] function can otherwise alter the orientation of a stream. (A successful call to freopen removes any orientation.)

    5 Byte input/output functions shall not be applied to a wide-oriented stream and wide character input/output functions shall not be applied to a byte-oriented stream. The remaining stream operations do not affect, and are not affected by, a stream’s orientation, except for the following additional restrictions: […]

    — For wide-oriented streams, after a successful call to a file-positioning function that leaves the file position indicator prior to the end-of-file, a wide character output function can overwrite a partial multibyte character; any file contents beyond the byte(s) written are henceforth indeterminate.

    Mixing output & input without flushing is covered by § 7.19.5.3 6 (fopen):

    6 When a file is opened with update mode (’+’ as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), […]

    These are also listed in the Big List of Undefined Behavior, Annex J.2:

    The behavior is undefined in the following circumstances:

    […]

    — A byte input/output function is applied to a wide-oriented stream, or a wide character input/output function is applied to a byte-oriented stream (7.21.2).

    — Use is made of any portion of a file beyond the most recent wide character written to a wide-oriented stream (7.21.2).

    […]

    — An output operation on an update stream is followed by an input operation without an intervening call to the fflush function or a file positioning function, […] (7.19.5.3).

    The Solutions

    There are two approaches:

    • use freopen in between the wide-character and byte-oriented functions, or
    • use only byte-oriented functions (e.g. fwrite), and fflush (or fseek, as per the standard) in between writing & reading.

    Note fwide can only set the orientation of unoriented streams, so it can't address the issues; once the orientation of a stream is set, it can only be cleared with freopen.

    freopen Solution

    freopen on its own addresses 2 of the 3 issues:

    1. It clears the orientation in between the wide and byte orented functions, so they're not mixed.
    2. On its own, freopen will leave any garbage characters in the tail of the file, though it shouldn't be an issue in the given example. If this is an issue, the stream must first be truncated (though this isn't appropriate for the example).
    3. freopen calls fflush, so that output is not directly followed by input.
        const char* fName = "file";
        f = fopen(fName, "rb+");
        putw(1, f);
        // truncate here, if applicable
        if (freopen(NULL, "rb+", f)) {
            int nA;
            fread(&nA, sizeof(nA), 1, f);
            printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
            printf("position after 1st fread: %ld\n", ftell(f));
            printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
            printf("position after 2nd fread: %ld\n", ftell(f));
        }
    

    Byte-Oriented I/O Solution

    Replacing putw with fwrite and adding a call to fflush addresses all 3 issues:

    1. No more wide-orientation functions are used, so there's no orientation mixing.
    2. With no wide-orientation functions being used, you don't have the problem of framing errors mentioned in § 7.21.2 5.
    3. fflush explicitly addresses § 7.19.5.3 6.
        const char* fName = "file";
        f = fopen(fName, "rb+");
        int nA = 1;
        fwrite(&nA, sizeof(nA), 1, f);
        fflush(f);
        printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
        printf("position after 1st fread: %ld\n", ftell(f));
        printf("fread result: %d\n", fread(&a, sizeof(A), 1, f));
        printf("position after 2nd fread: %ld\n", ftell(f));
    

    PS

    In the context of the toy problem, the call to putw followed by fread doesn't make much sense as something that would be done in production (though that's not as important, as its purpose is to illustrate an issue). As such, the above solutions might not address aspects of production code that mixes putw with fread.

    Only minimal error handling is shown in the sample code.