I have a text file which has numbered entries, a timecode and a transcript. I am trying to remove the line breaks in the transcript and leave the others. I'm trying to use grep or awk.
File is like
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single
line no matter how long that line.
Output would look like:
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.
thanks to all who have provided help
Don't rely on lines starting (or not) with any specific characters - just attach the 4th and subsequent lines in each record to the end of the 3rd line of that record:
$ awk '
BEGIN { RS=ORS=""; FS=OFS="\n" }
{
print $1,$2,$3
for (i=4;i<=NF;i++)
printf " %s", $i
print "\n\n"
}
' file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.