I have a directory with more then 60000 files. How to get only N of them without using a find | head -n
or ls | head -n
solutions, since find
and ls
to read this list of files takes too much time. Are there any configs for ls
and find
or are there any other programs, which can help to safe the time?
For what it worth:
# Create 60000 files
sh$ for i in {0..100}; do
for j in {0..600}; do
touch $(printf "%05d" $(($i+$j*100)));
done;
done
On Linux Debian Wheezy x86_64 w/ext4 file system:
sh$ time bash -c 'ls | head -n 50000 | tail -10'
49990
49991
49992
49993
49994
49995
49996
49997
49998
49999
real 0m0.248s
user 0m0.212s
sys 0m0.024s
sh$ time bash -c 'ls -f | head -n 50000 | tail -10'
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303
real 0m0.051s
user 0m0.016s
sys 0m0.028s
sh$ time bash -c 'find | head -n 50000 | tail -10'
./02491
./55530
./44435
./24255
./47247
./16033
./45447
./18434
./35303
./07658
real 0m0.051s
user 0m0.024s
sys 0m0.024s
sh$ time bash -c 'ls -f | sed -n 49990,50000p'
30950
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303
real 0m0.046s
user 0m0.032s
sys 0m0.016s
Of course, the following two are faster, as they only take the first entries (and they interrupt the pair process with a broken pipe once the required "lines" have been read):
sh$ time bash -c 'ls -f | sed 1000q >/dev/null'
real 0m0.008s
user 0m0.004s
sys 0m0.000s
sh$ time bash -c 'ls -f | head -1000>/dev/null'
real 0m0.008s
user 0m0.000s
sys 0m0.004s
Interestingly enough (?) with sed
we spend our time in user space process, whereas with head
it is in sys. After several runs, the results are consistent...