I have this tsv (tab separated) file having 2 columns. The first column is a single (or group of) words and second column is it's meaning.
test file
test try
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
I am trying to merge second and third line because it is in double quotes. For e.g.
Expected Output
test try
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
I tried this:
awk -v FS='\t' -v OFS='\t' '{print $1, $2}' test.tsv
test try
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
But it does not merge the line 2 and 3. I tried "partsplit" and that merged all lines together.
awk 'BEGIN { FS=OFS="\t"}
{
if (patsplit($0,a,/"[^"]+"/,s)) {
gsub(/\n/,"",a[1])
printf "%s%s%s", s[0],a[1],s[1]
}
else
printf "%s", $0
printf ";"
}' test.tsv
I need to keep the tab separated format like the original file. The only change required is to merge text in 2 double quotes.
To just replace each newline within quoted fields with a blank character, using GNU awk for multi-char RS
and RT
:
$ awk -v RS='"[^"]*"' '{gsub(/\n/," ",RT); ORS=RT} 1' file
test try
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
The above will work no matter where the double quotes appear in your input, even if your quoted string includes double quotes that have been escaped by putting a second double quote next to them as is common in quoted fields in CSVs, TSVs, etc., e.g.:
$ cat file
test try
test a short exam to measure somebody's "knowledge
or skill" in something.
testing examine
test try
test "a short exam to measure somebody's ""knowledge""
or skill in something."
testing examine
$ awk -v RS='"[^"]*"' '{gsub(/\n/," ",RT); ORS=RT} 1' file
test try
test a short exam to measure somebody's "knowledge or skill" in something.
testing examine
test try
test "a short exam to measure somebody's ""knowledge"" or skill in something."
testing examine
See What's the most robust way to efficiently parse CSV using awk? for more info on parsing CSVs (which can also be applied to TSVs) with awk.
In response to the comments below - the awk command is doing exactly 1 thing every time - replacing each \n
with a blank, that's what gsub(/\n/," ",...)
does. If that's not what you want then just don't do that, do whatever you want to do instead, but you never said in your question how you want to merge the lines so I had to guess at something.
I'd recommend you don't just remove newlines as the other solutions do as that will concatenate words if there ever aren't spaces around the \n
s but maybe you want gsub(/[[:space:]]*\n[[:space:]]*/," ",...)
or similar, I don't know.
Here's some other input to consider:
$ cat file
test new first
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
test new second
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
The first one "new first" does not have a blank at the end of the line after knowledge
and the second one "new second" has a tab at the start of the line before or
. Now let's put all of the above test cases into one file (long spaces are tabs):
$ cat file
test try
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
test try
test a short exam to measure somebody's "knowledge
or skill" in something.
testing examine
test try
test "a short exam to measure somebody's ""knowledge""
or skill in something."
testing examine
test new first
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
test new second
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
and then test that with all of the current answers:
gsub(/[[:space:]]*\n[[:space:]]*/," ",...)
instead of gsub(/\n/," ",...)
:$ awk -v RS='"[^"]*"' '{gsub(/[[:space:]]*\n[[:space:]]*/," ",RT); ORS=RT} 1' file
test try
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
test try
test a short exam to measure somebody's "knowledge or skill" in something.
testing examine
test try
test "a short exam to measure somebody's ""knowledge"" or skill in something."
testing examine
test new first
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
test new second
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
$ sed ':a;N;/\n[^\t]*$/s/\n//;ta;P;D' file
test try
test "a short exam to measure somebody's knowledge or skill in something."testing examine
test try
test a short exam to measure somebody's "knowledgeor skill" in something.testing examine
test try
test "a short exam to measure somebody's ""knowledge""or skill in something."testing examinetest new firsttest "a short exam to measure somebody's knowledgeor skill in something."testing examinetest new secondtest "a short exam to measure somebody's knowledge
or skill in something."testing examine
$ awk -F'\t' '$2~/^"/{ORS=""}/"$/{ORS="\n"}1' file
test try
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
test try
test a short exam to measure somebody's "knowledge
or skill" in something.
testing examine
test try
test "a short exam to measure somebody's ""knowledge""
or skill in something."
testing examine
test new first
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
test new second
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
so you can decide which is producing the behavior you want.