Search code examples
cfilefopenfreadfseek

The fread function throws an exception when reading a binary file


This function receives the main.c binary file path in the command line parameter, then reads the contents of this file, puts it into the buffer character array, adds the terminator '\0' at the end, converts it to a string and returns it.

static char* readFile(const char* path) {
    FILE* file = fopen(path, "rb");
    if (!file) {
        fprintf(stderr, "Could not open file \" %s \".\n", path);
        exit(1);
    }

    fseek(file, 0L, SEEK_END);
    size_t fileSize = ftell(file);
    rewind(file);

    char* buffer = (char*)malloc(sizeof(char) * (fileSize + 1));
    if (!buffer) {
        fprintf(stderr, "Not enought memory to read \%s\".\n", path);
        exit(1);
    }

    size_t bytesRead = fread(buffer, sizeof(char), fileSize, file); // ERROR!!!

    if (bytesRead < fileSize) {
        fprintf(stderr, "Could not read file \"%s\".\n", path);
        exit(1);
    }
    buffer[bytesRead] = '\0';
    
    fclose(file);
    return buffer;
}

Statements that cause exceptions: size_t bytesRead = fread(buffer, sizeof(char), fileSize, file);

The fread function throws an exception when reading a binary file. I tried forcing type conversion on fileSize, but it doesn't seem to be a type problem. fileSize can correctly receive the number of bytes in the file. I really don't know how to correct it.


Solution

  • fseek(file, 0L, SEEK_END);
    size_t fileSize = ftell(file);
    rewind(file);
    

    Binary streams need not support the SEEK_END value, and the fseek() statement is specified as having undefined behavior in the ISO C Standard.

    Subclause 7.21.9.2 of the C Standard [ISO/IEC 9899:2011] specifies the following behavior for fseek() when opening a binary file in binary mode:

    A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

    In addition, footnote 268 of subclause 7.21.3 says:

    Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.

    Don't use fseek() + ftell() to find the size of the file. Use POSIX's stat()/fstat() (Yes, they're supported by Windows too, albeit with preceding underscores), or just allocate a fixed amount of memory and realloc() as needed until fread() returns a short count.

    If it's a binary file, then you should expect that it may contain null bytes, and therefore you cannot rely on a null byte to indicate the end of the buffer. You need to rely on a buffer count. @Tom Karzes

    Other than that, C doesn't have exceptions. fread() can not raise an exception. What you're referring to is probably something Windows specific.


    Here's something you might find useful (for the links and the defines/includes):

    bool io_fsize(FILE *stream, uintmax_t *size)
    {
    /*   Windows supports fileno(), struct stat, and fstat() as _fileno(),
     *   _fstat(), and struct _stat.
     *
     *   See: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fstat-fstat32-fstat64-fstati64-fstat32i64-fstat64i32?view=msvc-170 */
    #ifdef _WIN32
        #define fileno _fileno
        #ifdef _WIN64
            #define fstat  _fstat64
            #define stat   __stat64
        #else
            #define fstat  _fstat
            #define stat   _stat
        #endif                          /* _WIN64 */
    #endif                              /* _WIN32 */
    
    /* According to https://web.archive.org/web/20191012035921/http://nadeausoftware.com/articles/2012/01/c_c_tip_how_use_compiler_predefined_macros_detect_operating_system
     * __unix__ should suffice for IBM AIX, all distributions of BSD, and all
     * distributions of Linux, and Hewlett-Packard HP-UX. __unix suffices for Oracle
     * Solaris. Mac OSX and iOS compilers do not define the conventional __unix__,
     * __unix, or unix macros, so they're checked for separately. WIN32 is defined
     * on 64-bit systems too. */
    #if defined(_WIN32) || defined(__unix__) || defined(__unix) || (defined(__APPLE__) && defined(__MACH__))
        struct stat st;
    
        /* rewind() returns no value. */
        rewind(stream);
    
        if (fstat(fileno(stream), &st) == 0) {
            *size = (uintmax_t) st.st_size;
            return true;
        }
        return false;
    #else
        /* Fall back to the default and read it in chunks. */
        uintmax_t rcount = 0;
        char chunk[IO_CHUNK_SIZE];
    
        /* rewind() returns no value. */
        rewind(stream);
    
        do {
            rcount = fread(chunk, 1, IO_CHUNK_SIZE, stream);
    
            if ((*size + rcount) < *size) {
                /* Overflow. */
                return false;
            }
            *size += rcount;
        } while (rcount == IO_CHUNK_SIZE);
        return !ferror(stream);
    #endif                          /* defined(_WIN32) || defined(__unix__) || defined(__unix) || (defined(__APPLE__) && defined(__MACH__)) */
    #undef fstat
    #undef stat
    #undef fileno
    }
    

    Where IO_CHUNK_SIZE is defined to 8 or 64 KiBs.