Search code examples
clinuxfile-descriptor

Is there a way to test for end of file in less than 3 syscalls?


I want to test whether the given file's position, referenced by fd, is at the end of file. E. g. current position == file size. Is there a way to do this in less than 3 sys calls? The 3 calls being:

  1. Get current position with lseek
  2. lseek to end of file and store that position (i. e. the file size)
  3. Compare the two, and if they're different, lseek back to the original position.

Solution

  • You can test for end-of-file in just one syscall: a single read! If it returns 0, you're at end-of-file. If it doesn't, you weren't.

    ...and, of course, if it returns greater than 0, you're not where you were any more, so this might not be a good solution. But if your primary task was reading the file, then the data you've just read with your one read call is quite likely to be data you wanted anyway.

    In a comment you said that code that merely calls read can be "convoluted and produce code that is harder to work with", and I kind of know what you mean. I can vaguely remember, once or twice in my career, wishing I could know whether the next read was going to succeed, before I had to do it. But that was just once or twice. The vast, vast majority of the time, for me at least, code that just reads reads reads until one read call returns 0 ends up being perfectly natural and straightforward.


    Addendum:

    There's some pseudocode from K&R that always sticks with me, for the basic version of grep that they introduce as an example in a fairly early chapter:

    while (there's another line) {
        if (line contains pattern) {
            print it;
        }
    }
    

    That's for line-based input, but the more-general pattern

    while (there's some input)
        process it;
    

    has equal merit, and the fleshing-out to an actual read call doesn't involve that big a change:

    while ((n = read(fd, buf, bufsize)) > 0) {
        process n bytes from buf;
    }
    

    At first the embedded read-and-test — that is, the assignment to n, and the test against 0, buried in the single control expression of the while loop — used to really bug me, seemed unnecessarily cryptic. But it really, really does encapsulate the "while there's input / process it" idiom rather perfectly, or at least, given a C/Unix-style read call that can only indicate EOF after you call it.

    (This is by contrast to Pascal-style I/O, which does indicate EOF before you call it, and is, or used to be, a prime motivator for all the questions that led to Why is while( !feof(file) ) always wrong? being a canonical SO question. Brian Kernighan has a description, probably in Why Pascal Is Not My Favorite Programming Language, of how frustratingly difficult and unnatural it is to implement a Pascal-style input methodology that can explicitly indicate EOF before it happens.)