Search code examples
bashfindlsheadtail

How to read first n-th files from directory (pleaso NOT a "head -n solution")?


I have a directory with more then 60000 files. How to get only N of them without using a find | head -n or ls | head -n solutions, since find and ls to read this list of files takes too much time. Are there any configs for ls and find or are there any other programs, which can help to safe the time?


Solution

  • For what it worth:

    # Create 60000 files
    sh$ for i in {0..100}; do
        for j in {0..600}; do
            touch $(printf "%05d" $(($i+$j*100)));
        done;
    done
    

    On Linux Debian Wheezy x86_64 w/ext4 file system:

    sh$ time bash -c 'ls | head -n 50000 | tail -10'
    49990
    49991
    49992
    49993
    49994
    49995
    49996
    49997
    49998
    49999
    
    real    0m0.248s
    user    0m0.212s
    sys 0m0.024s
    


    sh$ time bash -c 'ls -f | head -n 50000 | tail -10'
    27235
    02491
    55530
    44435
    24255
    47247
    16033
    45447
    18434
    35303
    
    real    0m0.051s
    user    0m0.016s
    sys 0m0.028s
    


    sh$ time bash -c 'find | head -n 50000 | tail -10'
    ./02491
    ./55530
    ./44435
    ./24255
    ./47247
    ./16033
    ./45447
    ./18434
    ./35303
    ./07658
    
    real    0m0.051s
    user    0m0.024s
    sys 0m0.024s
    


    sh$ time bash -c 'ls -f | sed -n 49990,50000p'
    30950
    27235
    02491
    55530
    44435
    24255
    47247
    16033
    45447
    18434
    35303
    
    real    0m0.046s
    user    0m0.032s
    sys 0m0.016s
    

    Of course, the following two are faster, as they only take the first entries (and they interrupt the pair process with a broken pipe once the required "lines" have been read):

    sh$ time bash -c 'ls -f | sed 1000q >/dev/null'
    
    real    0m0.008s
    user    0m0.004s
    sys 0m0.000s
    


    sh$ time bash -c 'ls -f | head -1000>/dev/null'
    
    real    0m0.008s
    user    0m0.000s
    sys 0m0.004s
    

    Interestingly enough (?) with sed we spend our time in user space process, whereas with head it is in sys. After several runs, the results are consistent...