Search code examples
bashsedbsd

match repeated character in sed on mac


I am trying to find all instances of 3 or more new lines and replace them with only 2 new lines (imagine a file with wayyy too much white space). I am using sed, but OK with an answer using awk or the like if that's easier.

note: I'm on a mac, so sed is slightly different than on linux (BSD vs GNU)

My actual goal is new lines, but I can't get it to work at all so for simplicity I'm trying to match 3 or more repetitions of bla and replace that with BLA.

Make an example file called stupid.txt:

$ cat stupid.txt

blablabla
$

My understanding is that you match i or more things using regex syntax thing{i,}.
I have tried variations of this to match the 3 blas with no luck:

cat stupid.txt | sed 's/bla{3,}/BLA/g'      # simplest way
cat stupid.txt | sed 's/bla\{3,\}/BLA/g'    # escape curly brackets
cat stupid.txt | sed -E 's/bla{3,}/BLA/g'   # use extended regular expressions
cat stupid.txt | sed -E 's/bla\{3,\}/BLA/g' # use -E and escape brackets

Now I am out of ideas for what else to try!


Solution

  • If slurping the whole file is acceptable:

    perl -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
    

    Where you should replace \n with whatever newline sequence is appropriate.

    -0777 tells perl to not break each line into its own record, which allows a regex that works across lines to function.

    If you are satisfied with the result, -i causes perl to replace the file in-place rather than output to stdout:

    perl -i -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
    

    You can also do as so: -i~ to create a backup file with the given suffix (~ in this case).

    If slurping the whole file is not acceptable:

    perl -ne 'if (/^$/) {$i++}else{$i=0}print if $i<3' newlines.txt
    

    This prints any line that is not the third (or higher) consecutive empty line. -i works with this the same.

    ps--MacOS comes with perl installed.