Search code examples
cunixposixsystem-callserrno

For wrappers around syscalls that retry on EINTR, how many times does retrying make sense?


Often syscalls like write(2), read(2), close(2) et cetera fail due to being interrupted by a signal with the errno value EINTR (say the size of the terminal window changed and SIGWINCH was received), which is a transient error and ought to be retried, and code often uses wrappers around these sycalls that retry on EINTR (and often EAGAIN or ENOBUFS).

But it is possible to get stuck in the theoretical situation where code just continue infinitely looping on EINTR due to either receiving non-stop signals, or because the syscall was intercepted by a custom implementation of that syscall that just returns EINTR.

In such cases, in library code, how many times does it make sense to retry the syscall?


Solution

  • For wrappers around syscalls that retry on EINTR, how many times does retrying make sense?

    From zero to infinitely many.

    Glibc standard library is in use in billions of devices around the world. Calling printf("Hello world\n"); will end up in the _IO_new_file_write function that looks like the following, from https://github.com/bminor/glibc/blob/5aa2f79691ca6a40a59dfd4a2d6f7baff6917eb7/libio/fileops.c#L1176 :

    ssize_t
    _IO_new_file_write (FILE *f, const void *data, ssize_t n)
    {
      ssize_t to_do = n;
      while (to_do > 0)
        {
          ssize_t count = (__builtin_expect (f->_flags2
                                             & _IO_FLAGS2_NOTCANCEL, 0)
                   ? __write_nocancel (f->_fileno, data, to_do)
                   : __write (f->_fileno, data, to_do));
          if (count < 0)
        {
          f->_flags |= _IO_ERR_SEEN;
          break;
        }
          to_do -= count;
          data = (void *) ((char *) data + count);
        }
      n -= to_do;
      if (f->_offset >= 0)
        f->_offset += n;
      return n;
    }
    

    As you can, while (to_do > 0) the function will loop infinitely many times until the data are written, ignorin any EINTR signal and not even checking for any.

    Because this software is used in literally almost every single linux device around the world, it is safe to say that looping infinitely many times is completely fine.

    Now, you may be working with a non-standard implementation of write. For example on an embedded device the programmer may implement his own implementation of write, like _write_r when using Newlib C standard library. If such a programmer sets errno = EINTR and returns 0 from his write function endlessly, I would say that's on him. But if you feel like you want to detect such situations, go ahead. I do not feel like there is the need to do it.

    The contract of write function is just that when the number of bytes written is not equal to how many bytes you wanted to write, you should repeat the call, with shifted data and count. That's that.