Search code examples
tclnfsmkdir

Atomicity of mkdir


I'm encountering an odd issue on an NFS v3 file system (I feel this is important) running two processes in parallel doing (following the comment below and my own knowledge in the matter I don't think the language should matter, and I think this is readable enough):

if { ! [file isdirectory $dir]} {
    if {[catch { file mkdir $dir} err]} {
        error "-E- failed to mkdir $dir: $err"
    }
} 

For those not familiar, file mkdir in tcl behaves much like mkdir -p - it should only fail if the directory exists and is not a directory. I'm nearly 100% (there is no 100% ever) that nothing is creating that file in any process, only file mkdir. The problem does not happen always, but often enough while running our regressions we might hit a:

Error: can't create directory "$dir": file already exists

This should only happen if during the file mkdir processing $dir is an existing non-directory file. Two questions, the first is more important for me:

  1. Is mkdir not atomic here? In particular could the file node in the filesystem exist as a non-directory for any amount of time during creation?
  2. Assuming this really is the error, is there a simple atomic way to do this? I thought about exec mkdir -p, but if I'm right this will suffer from the same problem.

It's hard enough to reproduce this so I'd rather be as sure as I can before I attempt a fix. I came here after following a hint that says the nfs FS maybe the issue, but I need more expert advice. I don't care if both succeed, I just don't want them to fail (on first try).

Final note

I circled back to this after a long while - and this is indeed a tcl issue, but not only on nfs, though nfs seems to make it worse!.

Still looking for answers explaining why I'm seeing what I'm seeing - see answer.

Opened this as a bug

https://core.tcl.tk/tcl/tktview/270f78ca95b642fbed81ed03ad381d64a0d0f7df

Bug already fixed!

The people at tcl core are fast!


Solution

  • The guys and girls at TCL core have fixed this a day after I posted the bug!

    https://core.tcl.tk/tcl/tktview/270f78ca95b642fbed81ed03ad381d64a0d0f7df

    Fixed in 1c12ee9e45222d6c.

    A thanks to mrcalvin for the suggestion.


    The old testing attempts:

    After a long while I circled back to this, and made the following tests (on ext4):

    Two terminals with tclsh:

    1: while {1} {file mkdir bla}
    2: while {1} {file mkdir bla; file delete bla}
    

    Error eventually on 1::

    can't create directory "bla": no such file or directory
    

    Two terminals with tclsh:

    1: while {1} {exec mkdir -p bla}
    2: while {1} {exec mkdir -p bla; file delete bla}
    

    No error.

    One terminal Bash one tclsh:

    1: while [ 1 ]; do mkdir -p bla; done
    2: while {1} {file mkdir bla; file delete bla}
    

    eventually I get on 1::

    mkdir: cannot create directory ‘bla’: File exists
    

    but oddly enough

    1: while [ 1 ]; do mkdir -p bla; rm -rf bla; done
    2: while {1} {file mkdir bla}
    

    no error (delete is the culprit?) and

    1: while [ 1 ]; do mkdir -p bla; done
    2: while {1} {exec mkdir -p bla; file delete bla}
    

    much less chance of error (so delete not as bad?). Of course two bash shells do not conflict:

    1: while [ 1 ]; do mkdir -p bla; rm -rf bla; done
    2: while [ 1 ]; do mkdir -p bla; done
    

    On NFS but not on EXT4

    1: while {1} {file mkdir bla; exec rm -rf bla}
    2: while {1} {file mkdir bla}
    

    fails with

    can't create directory "bla": file already exists
    

    on both 1: 2: (randomly).

    Conclusion

    file mkdir is not as "thin" a layer as I thought and can produce race conditions where one mkdir thinks a directory being made is a file. file delete may also have this or a similar issue. It may be also contributing in my tests to failures, but not in my original question - the matter is worsened for NFS systems, where file mkdir alone is easily reproducing the error.

    The solution is to use exec mkdir -p. So far this is working for us across the board.