Search code examples
clinuxfile-ioposixunbuffered

read() of big 6GB file fails on x86_64


Here is the description of my problem:

I want to read a big file, about 6.3GB, all to memory using the read system call in C, but an error occurs. Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <limits.h>

int main(int argc, char* argv[]) {
    int _fd = open(argv[1], O_RDONLY, (mode_t) 0400);
    if (_fd == -1)
        return 1;
    off_t size = lseek(_fd, 0, SEEK_END);
    printf("total size: %lld\n", size);
    lseek(_fd, 0, SEEK_SET);
    char *buffer = malloc(size);
    assert(buffer);
    off_t total = 0;
    ssize_t ret = read(_fd, buffer, size);
    if (ret != size) {
        printf("read fail, %lld, reason:%s\n", ret, strerror(errno));
        printf("int max: %d\n", INT_MAX);
    }
}

And compile it with:

gcc read_test.c

then run with:

./a.out bigfile

output:

total size: 6685526352
read fail, 2147479552, reason:Success
int max: 2147483647

The system environment is

 3.10.0_1-0-0-8 #1 SMP Thu Oct 29 13:04:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

There two places I don't understand:

  1. Reading fails on a big file, but not on a small file.
  2. Even if there is an error, it seems that the errno is not correctly set.

Solution

  • The read system call can return a smaller number than the requested size for multiple reasons, a positive non zero return value is not an error, errno is not set in this case, its value is indeterminate. You should keep reading in a loop until read returns 0 for end of file or -1 for an error. It is a very common bug to rely on read to read a complete block in a single call, even from regular files. Use fread for simpler semantics.

    You print the value of INT_MAX, which is irrelevant to your issue. The size of off_t and size_t are the interesting ones. On your platform, 64 bit GNU/Linux, you are lucky that both off_t and size_t are 64 bit long. ssize_t has the same size as size_t by definition. On other 64 bit platforms, off_t might be smaller than size_t, preventing correct assessment of the file size, or size_t might be smaller than off_t, letting malloc allocate a block smaller than the file size. Note that in this case, read will be passed the same smaller size because size would be silently truncated in both calls.