Search code examples
linuxawksedtxt

I need to remove white spacing for lines that are separated by white spacing, in a .txt file, from Ubuntu


enter image description here

Hello!!

As it says in the title, I need to remove the white spacing that separate only by a white spacing to two lines, as you see in the image, the white spacing that have a green line, are the ones I need to remove, but the multiple white spacing that are with a red line, I do not want to remove them, the ones with the green line, are separated by only a white spacing, I do not know if with AWK or SED or CUT will work, the problem is that I do not know how to do it, thank you for your help.

I tried to do it with SED and with AWK as follows, but it did not produce any effect

awk -F, '{gsub("\n","",$1); print}' archivo.txt

sed 's/ //g' input.txt > no-spaces.txt

Solution

  • Assumptions:

    • The input file has "\n" (not "\r\n") line endings.
    • Non-empty line contains at least two charactes.
    • We don't have to care about an empty line at the beginning or the ending of the file.

    If GNU sed which supports -z (slurp) option and \n notation is available. would you please try:

    sed -Ez "s/([^\n]\n)\n([^\n])/\1\2/g" input.txt > no-spaces.txt
    

    Example of input.txt:

    line1
    line2 # following blank line should be removed
    
    line3 # following blank lines should be kept
    
    
    
    line4
    

    Output:

    line1
    line2 # following blank line should be removed
    line3 # following blank lines should be kept
    
    
    
    line4
    

    Sed normally processes the input line by line. That is why we cannot process the input across multiple lines. The -z option changes the behavior by setting the input line separator to the NUL character.

    • ([^\n]\n) matches the last character of non-blank line. \1 is set as a bac kreference.
    • \n is the blank line in between (to be removed).
    • ([^\n]) matches the first character of the following non-blank line. \2 is set as a backreference.

    Btw following will work with any POSIX-compliant sed with a help of bash:

    #!/bin/bash
    
    # define newline character for replacement
    NL=$'\\\n'
    
    sed -E '
    :l
    N
    $!b l
    # first slurp all lines in the pattern space
    # and perform the replacements over the lines
    s/([^'"$NL"']'"$NL"')'"$NL"'([^'"$NL"'])/\1\2/g
    ' input.txt > no-spaces.txt