I would like to generate a random file of size 2MB which consists only of 0's and 1's in either Linux/Windows for one of my projects. I tried this command in Linux:
$ time dd if=/dev/urandom of=/dev/null bs=1M count=2
but urandom only takes random data from the kernel and just copies to the file which is not what I need. Any ideas regarding this?
EDIT: All these solutions are pretty bad in practice. tripleee's proposal (pipe the output of /dev/urandom to
perl -0777 -ne 'print unpack("b*")'
) in questions' comments is much better.
Do you need something fast ? Otherwise you can try that (took ~2mn for me):
$ time (for i in `seq 1 $((2*1024*1024))`;
do echo -n $(($RANDOM%2)); done > random.txt)
You can make it faster by calling $RANDOM less, for example:
$ time (i=$((2*1024*1024)); a=0; while [ $i -gt 0 ]; do if [ $a -lt 2 ]; then
a=$RANDOM; fi; echo -n "$(($a%2))"; let a=$a/2; let i=$i-1; done > random.txt)
It's nearly 4 times faster in my case. What it does is that it extracts right bit of the number until there's no more 1 in the number. It may therefore slightly be biaised toward 1.
However, if you want a fast solution, you should clearly not use a shell scripting language. You can do it easily in python (this takes ~2 seconds in my case):
$ time (python -c "import random; print(''.join('{0}'.format(n) for n in
random.sample([0,1]*16*1024*1024, 2*1024*1024)));" > random.txt)
Here I'm randomly sampling a big list of 0 and 1. However, I'm not sure of the effect of sampling on the quality of the randomness. If the list is huge compared to the sample, I think it should provide a good quality result but here it's only 8 times bigger so it probably has a measurable impact.
Note that randomness is not as easy as it may seem. The output of the solutions I propose here don't all have the same properties and verifying which one it has is often complex. You may want to trade performance for 'better' randomness, in which case this version in python may be better (~6 seconds in my case):
$ time (python -c "from __future__ import print_function; import random;
[print(random.randint(0,1), end='') for i in range(0, 2*1024*1024)];" > random.txt)
Here, random.randint should provide an evenly distributed result.