Search code examples
awksedcommand-line-interfacenano

dejustify (unjustify): replace each *single* linefeed with a space, but don't touch groups of linefeeds. sed, awk, or something else?


I'm looking for a linux command-line solution to the problem:

"Replace each single linefeed with a space, but don't modify any groups of consecutive linefeeds i.e. do not modify any linefeed which has another linefeed next to it." As an example:

one two
three four
five six

seven eight
nine ten

should become:

one two three four five six

seven eight nine ten

I am already aware that every valid text file should end with a linefeed, but if your proposed solution deletes that final-character-linefeed, that would not be a problem (it would be easy for me to append it back on afterwards).

I think that this is "too complex a task" for tr, but I assume something should be possible in sed or awk (if not, then I'll need to "rustle up" something in python or c). Unfortunately, my sed-fu is weak (as is my awk-fu) - are there any sed/awk black-belts around that could please help me?

I have already found How can I replace each newline (\n) with a space using sed? but of course the suggested answers to that question wipe out my "multiple consecutive linefeeds" (which I want to preserve).

I am also aware that "Sed is line-based therefore it is hard for it to grasp newlines" - perhaps sed is not the best tool for this job.

I have also found Replace only single instance of a character using sed but of course the character being replaced in that question is not a (problematic) linefeed.

(Why do I want this? The nano editor has a justify function which adds and removes single linefeeds so that any line "fills" the chosen line length but does not overrun it. nano does have a "built-in" unjustify function, but this is really just an "undo", not a "real" unjustify. What I am trying to find is the closest thing to a "genuine" unjustify command.)

Update: all the current solutions work perfectly, and thank you to all those who provided them. I've accepted Ed Morton's for the reasons that he gives - his processes only 1 line of input at a time, and it's portable to a non-gnu version of its tool. The solution to my nano problem is:

cat << 'EOF' > $HOME/.local/bin/dejustify
#!/bin/sh
awk -v RS= 'NR>1{print ""} {$1=$1} 1' < "${1:-/dev/stdin}"
EOF
chmod u+x $HOME/.local/bin/dejustify

(I found the < "${1:-/dev/stdin}" here.)

I can now use it in a pipeline (printf "one\ntwo\nthree\nfour\n" | dejustify) or just dejustify <filename>.

Inside nano, I can <Ctrl>+<t> then enter |dejustify to dejustify my text. Success! 🙏


Solution

  • Using any awk:

    $ awk -v RS= -v ORS='\n\n' -F'\n' '{$1=$1} 1' file
    one two three four five six
    
    seven eight nine ten
    
    

    Breaking it down:

    • -v RS= treat the input as [possibly multi-line] records separated by 1 or more empty lines.
    • -v ORS='\n\n' put 2 newlines at the end of each output record.
    • -F'\n' set the field separator to a newline so that ONLY newlines get replaced in the next step, otherwise all chains of contiguous white space within each record would be replaced.
    • {$1=$1} update the value of a field, $1, thereby causing awk to rebuild the current record replacing all strings that match the FS (a newline) with an OFS (a blank char).
    • 1 a true condition causing awk to execute it's default action of printing the current record.

    The above will print a blank line at the end of the output, if that's a problem you can always do this instead:

    $ awk -v RS= -F'\n' 'NR>1{print ""} {$1=$1} 1' file
    one two three four five six
    
    seven eight nine ten
    

    which prints a blank line before each record except the first instead of printing a blank line after every record:

    • NR>1{print ""} if this is the second or subsequent record then print a blank line before it.