Search code examples
linuxreaddir

embedded linux: readdir() sometimes failing with EFAULT


I've had some readdir() issues occur in an embedded app, so I added this self-contained test at a convenient place in the app code:

FILE *f;
DIR *d;

f = fopen ("/mnt/mydir/myfile", "r");
printf ("fopen %p\r\n", f);
if (f) fclose(f);

d = opendir ("/mnt/mydir");
printf ("opendir ret %p\r\n", f);
if (d)
{
    struct dirent *entry;
    do
    {
    errno = 0;
    entry = readdir (d);
    printf ("readdir ret %p %s, errno %d %s\r\n", entry, entry ? entry->d_name : "", errno, strerror(errno));
    } while (entry);
    closedir (d);
}

/mnt/mydir is an NFS mount (although I'm not sure if that's relevant). The fopen() call to open a file in that dir always succeeds, and the opendir() on the dir also always succeeds. However, sometimes (most) the readdir() fails with errno=EFAULT.

I don't believe anywhere else in the app is doing anything with that dir. The test is exactly as written, all variables are local stack scope.

If I run it as a standalone program, it always succeeds.

Can anyone offer any suggestions as to what could cause EFAULT here? I'm pretty sure my DIR pointer variable is not being corrupted, although the DIR structure itself could be I guess. I haven't seen any evidence elsewhere of heap corruption.


Solution

  • I think I found the problem. The uClibc implementation of opendir/readdir does a stat() on the directory, then later does a stack alloca() of size statbuf.st_blksize. My NFS directory was mounted with rsize=512KB, causing readdir() to try and allocate 512KB on the stack to hold the dents. My embedded setup does not have that much room between stacks, so at some point was hitting something below in memory and causing EFAULT.

    If I change my NFS mount options to rsize=4096, it works fine.