Search code examples
bashrandomecho

How to replace random lines with empty string 30% of the time in bash?


Assuming that I created a file with 10 lines:

yes "foo bar" | head -n 10 > foobar.txt

[out]:

foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar

And I want to randomly replace 30% of the lines with empty line to look like this:

foo bar

foo bar

foo bar

foo bar
foo bar

foo bar

I could technically write a python script to generate random num and do ratio-foobar.sh

#!/bin/bash

ratio=$1
numlines=$2

coinflip() {
  randnum=$(bc -l <<< $(python -S -c "import random; print(int(random.random() * 100))"))
  if [ $randnum -gt $ratio ]
  then 
     return 1
  else
     return 0
  fi
}

for i in $(seq 1 $numlines);
do
  if coinflip
  then
    echo "foo bar"
  else
    echo ""
  fi
done

Usage:

bash ratio-foobar.sh 33 10 > foobar.txt

[out]:

foo bar
foo bar
foo bar

foo bar


foo bar
foo bar


But is there a simpler way to just generate (maybe with yes) a certain percent of the time?


Tried to use @renaud-pacalet solution but I realized the whole reading float into shell thing was a mess and somehow bc got involved again. But somehow this didn't work:

ratio=$1
lines=$2

ratio=$(echo "scale=3; $ratio/100" | bc)

yes "foo bar" | head -n $2 | awk 'BEGIN {srand()} {print rand() < $ratio ? "" : $0}' > output

cat output

Use: bash flip.sh 33 10 for 33% and 10 lines of foo bar.

But when the ratio is hard-coded, it worked:

ratio=$1
lines=$2

ratio=$(echo "scale=3; $ratio/100" | bc)

yes "foo bar" | head -n $2 | awk 'BEGIN {srand()} {print rand() < 0.3? "" : $0}' > output

cat output

Any solution to do this reading of the percentage and make the yes | head | awk works properly?


Solution

  • As you can apparently use python you could use only that:

    from random import randrange as rnd
    def foo(n, r, s):
        for i in range(n):
            print("" if rnd(100) < r else s)
    foo(10, 33, "foo bar")
    

    Where n is the number of lines to print, r is the percentage of empty lines and s is the string to print. See the argparse module if you want to pass arguments to a python script.

    You could do the same with any POSIX awk (tested with GNU awk):

    awk -v n=10 -v r=33 -v s="foo bar" '
    END {srand(); for(i=1; i<=n; i++) print rand() < r/100 ? "" : s}' /dev/null
    

    Or, with plain bash:

    n=10; r=33; s="foo bar"
    for (( i=1; i<=n; i++ )) ; do
      (( SRANDOM % 100 < r )) && echo "" || echo "$s"
    done
    

    The SRANDOM special variable expands as a 32 bits random number. So, it could be that you don't get exactly 33% of empty lines (2 to the power of 32 is not a multiple of 100) but the difference should be very small.