Search code examples
fish

How to iterate over a series of directories, running a program and redirecting its output to the input file instead of stdout? (fish)


Essentially, I want to run a program which takes FILE as input over a directory, recursing into each directory and running the program on each file. However, this program outputs a modified version of each file to STDOUT, and I want to have it instead output directly to the file it is reading from.

Example

foo
--bar.txt
--strob
----strob.cpp
frob.java

The program would run on bar.txt, strob.cpp, and frob.java, and write directly to those files instead of STDOUT.

A fish solution is preferred, but a bash/POSIX-compatible solution will work as well.


Solution

  • Reading-from and writing-to the same file is major hole in Unix/Linux pipelines. When you redirect output to a file with >, it first truncates the file to length 0; now reads return EOF. This is a limitation of the kernel API (a race even!) and affects all major shells.

    The two workarounds are to either buffer the input in memory, or to redirect the output to a temporary file, and then move it into place. Here is an example in fish of the second case. We'll use sort to sort the lines in place. A naive sort < file.txt > file.txt will leave the file empty, in all major shells; so we use a temporary file. Example in fish syntax:

    for file in (path filter -f **)
        set tmp (mktemp)
        sort < $file > $tmp
        mv $tmp $file
    end
    

    or in one line:

    for file in (path filter -f **); set tmp (mktemp); sort < $file > $tmp; mv $tmp $file; end
    

    To break this down: it first runs path filter -f ** to find all normal files under this directory. For each such file it creates an empty temporary file, sorts the input into that temp file, and replaces the input with that file.

    To generalize this you can create a function which does this temp file dance. Here inplace runs a command, saving the output to a temporary file, then outputting that file:

    function inplace
        set input_file $argv[1]
        set tmp (mktemp)
        $argv[2..] < $input_file > $tmp
        cat $tmp > $input_file
        rm $tmp
    end
    for file in (path filter -f **); inplace $file sort; end
    

    This is not pleasant but I think is the best that can be done given the kernels bestowed upon us.