I want to replace newlines \n
in a file only when the next line starts with optional spaces and a lower than charachter \s*<
.
Example Text:
FIRST LINE ('<foo>
<bar>
<baz>')
ANOTHER LINE 'lorem ipsim', '<dolor>
<and>
<p>again</p>
</and>
</dolor>'
I need to do that on the command line using sed, perl, tr, ...
I tried several command but none has worked so far.
Basically it is: sed -i -e 's|\n+\s*\<|<|gm' filename
It seems like sed does not look further than the newline.
https://regex101.com/r/VkRO9o/3
Is there any command that can do that?
EDIT: Expected Output:
FIRST LINE '<foo> <bar><baz>'
ANOTHER LINE 'lorem ipsim', '<dolor><and><p>again</p></and><dolor>'
It's fine if the spaces aren't replaced.
You may use perl
for this:
perl -0777 -pe "s/\h*\R+\h*([<'])/\$1/g" file
FIRST LINE ('<foo><bar><baz>')
ANOTHER LINE 'lorem ipsim', '<dolor><and><p>again</p></and></dolor>'
Details:
-0777
: Enable slurp mode to match across newlines/\h*\R+\h*([<'])
: Match 0+ horizontal whitespaces followed by 1+ line breaks followed by 0+ whitespaces and <
or '
. Note that we are capturing <
or '
in group #1. Replace this match with an $1
which is <
or '
that we've captured in group #1