does find /mnt/Dataset/ -type f | shuf -n 50
is doing the trick?
Does shuf
wait to count all the lines then do a random selection? Does shuf
give the same probability to each line? Or should I use another tool?
When you are wondering how shuf
works with the pipeline (wait for the pipeline to be finished or process data when it is available, you can write a test. The test will look like:
for ((i=0; i<20; i++)); do
(printf "%s\n" {1..9}; sleep 0.1; echo 10) | shuf | tr '\n' ' '
echo
done
This test is without the -n option and you want a larger sample to look at the averages. The next loop is better for testing
for ((i=0; i<10000; i++)); do
(printf "%s\n" {1..9}; sleep 0.01; echo 10) | shuf | tr '\n' ' '
echo
done > sample.txt
# Look for how often 10 is the last number on a line
grep -c "10 $" sample.txt
I also did a test:
cut -d " " -f1 sample.txt | sort | uniq -c
1040 1
985 10
976 2
1012 3
981 4
999 5
1043 6
974 7
979 8
1011 9
I did not check the distribution with the sample size, but it feels like a good random distribution.