Here is the description of my problem:
I want to read a big file, about 6.3GB, all to memory using the read
system call in C, but an error occurs.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <limits.h>
int main(int argc, char* argv[]) {
int _fd = open(argv[1], O_RDONLY, (mode_t) 0400);
if (_fd == -1)
return 1;
off_t size = lseek(_fd, 0, SEEK_END);
printf("total size: %lld\n", size);
lseek(_fd, 0, SEEK_SET);
char *buffer = malloc(size);
assert(buffer);
off_t total = 0;
ssize_t ret = read(_fd, buffer, size);
if (ret != size) {
printf("read fail, %lld, reason:%s\n", ret, strerror(errno));
printf("int max: %d\n", INT_MAX);
}
}
And compile it with:
gcc read_test.c
then run with:
./a.out bigfile
output:
total size: 6685526352
read fail, 2147479552, reason:Success
int max: 2147483647
The system environment is
3.10.0_1-0-0-8 #1 SMP Thu Oct 29 13:04:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
There two places I don't understand:
errno
is not correctly set.The read
system call can return a smaller number than the requested size for multiple reasons, a positive non zero return value is not an error, errno
is not set in this case, its value is indeterminate. You should keep reading in a loop until read
returns 0
for end of file or -1
for an error. It is a very common bug to rely on read
to read a complete block in a single call, even from regular files. Use fread
for simpler semantics.
You print the value of INT_MAX
, which is irrelevant to your issue. The size of off_t
and size_t
are the interesting ones. On your platform, 64 bit GNU/Linux, you are lucky that both off_t
and size_t
are 64 bit long. ssize_t
has the same size as size_t
by definition. On other 64 bit platforms, off_t
might be smaller than size_t
, preventing correct assessment of the file size, or size_t
might be smaller than off_t
, letting malloc
allocate a block smaller than the file size. Note that in this case, read
will be passed the same smaller size because size
would be silently truncated in both calls.