Search code examples
cmacosdriverkernel-extensionxnu

vnode and file descriptor in xnu, where does the file operation vector is stored


In xnu we have the vnode_t entity which represent the file globally.

Each process can access the file (assuming it has right permissions) by setting new file descriptor and set the vnode under fg_data

fp->f_fglob->fg_data = vp;

the vnode contain a list of basic actions for all relevant operations and is set in according to the file's FS. i.e. HFS+ driver implement such vector and set its vnode accordingly.

int     (**v_op)(void *);       /* vnode operations vector */

this is a vector for function pointers for all actions that may operate on the vnode.

In addition, we have the fileops struct that is part of the file descriptor (fg_global) which describe a minimal subset of these functions:

Here is a typical definition :

const struct fileops vnops = {
 .fo_type = DTYPE_VNODE,
 .fo_read = vn_read,
 .fo_write = vn_write,
 .fo_ioctl = vn_ioctl,
 .fo_select = vn_select,
 .fo_close = vn_closefile,
 .fo_kqfilter = vn_kqfilt_add,
 .fo_drain = NULL,
};

and we set it here :

fp->f_fglob->fg_ops = &vnops;

I saw that when reading regular file under local filesystem (HFS+), it works through the file_descriptor and not the vnode ...

 * frame #0: 0xffffff801313c67c kernel`vn_read(fp=0xffffff801f004d98, uio=0xffffff807240be70, flags=0, ctx=0xffffff807240bf10) at vfs_vnops.c:978 [opt]
frame #1: 0xffffff801339cc1a kernel`dofileread [inlined] fo_read(fp=0xffffff801f004d98, uio=0xffffff807240be70, flags=0, ctx=0xffffff807240bf10) at kern_descrip.c:5832 [opt]
frame #2: 0xffffff801339cbff kernel`dofileread(ctx=0xffffff807240bf10, fp=0xffffff801f004d98, bufp=140222138463456, nbyte=282, offset=<unavailable>, flags=<unavailable>, retval=<unavailable>) at sys_generic.c:365 [opt]
frame #3: 0xffffff801339c983 kernel`read_nocancel(p=0xffffff801a597658, uap=0xffffff801a553cc0, retval=<unavailable>) at sys_generic.c:215 [opt]
frame #4: 0xffffff8013425695 kernel`unix_syscall64(state=<unavailable>) at systemcalls.c:376 [opt]
frame #5: 0xffffff8012e9dd46 kernel`hndl_unix_scall64 + 22

My question is why does this duality needed, and in which cases the operation works through the file_descriptor vector (fg_ops) and which cases the operation works through the vnode vector (vp->v_op).

thanks


Solution

  • […] in which cases the operation works through the file_descriptor vector (fg_ops) and which cases the operation works through the vnode vector (vp->v_op).

    I'm going to start by answering this second part of the question first: if you trace through your call stack further, and look inside the vn_read function, you'll find that it contains this line:

        error = VNOP_READ(vp, uio, ioflag, ctx);
    

    The VNOP_READ function (kpi_vfs.c) in turn has this:

    _err = (*vp->v_op[vnop_read_desc.vdesc_offset])(&a);
    

    So the answer to your question is that for your typical file, both tables are used for dispatching operations.

    With that out of the way,

    My question is why does this duality needed […]

    Not everything to which a process can hold a file descriptor is also represented in the file system. For example, pipes don't necessarily have to be named. A vnode doesn't make any sense in that context. So in sys_pipe.c, you'll see a different fileops table:

    static const struct fileops pipeops = {
        .fo_type = DTYPE_PIPE,
        .fo_read = pipe_read,
        .fo_write = pipe_write,
        .fo_ioctl = pipe_ioctl,
        .fo_select = pipe_select,
        .fo_close = pipe_close,
        .fo_kqfilter = pipe_kqfilter,
        .fo_drain = pipe_drain,
    };
    

    Similar deal for sockets.

    File descriptors track the state of a process's view of a file or object that allows file-like operations. Things like position in the file, etc. - different processes can have the same file open, and they must each have their own read/write position - so vnode:fileglob is a 1:many relationship.

    Meanwhile, using vnode objects to track things other than objects within a file system doesn't make any sense either. Additionally, the v_op table is file system specific, whereas vn_read/VNOP_READ contain code that applies to any file that's represented in a file system.

    So in summary they're really just different layers in the I/O stack.