I have this tsv (tab separated) file having 2 columns. The first column is a single (or group of) words and second column is it's meaning.
test file
test try
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
I am trying to merge second and third line because it is in double quotes. For e.g.
Expected Output
test try
test "a short exam to measure somebody's knowledge or skill in something."
testing examine
I tried this:
awk -v FS='\t' -v OFS='\t' '{print $1, $2}' test.tsv
test try
test "a short exam to measure somebody's knowledge
or skill in something."
testing examine
But it does not merge the line 2 and 3. I tried "partsplit" and that merged all lines together.
awk 'BEGIN { FS=OFS="\t"}
{
if (patsplit($0,a,/"[^"]+"/,s)) {
gsub(/\n/,"",a[1])
printf "%s%s%s", s[0],a[1],s[1]
}
else
printf "%s", $0
printf ";"
}' test.tsv
I need to keep the tab separated format like the original file. The only change required is to merge text in 2 double quotes.
You can set the output record separator to an empty string when the second field begins with a double quote, and set it to a newline again when the record ends with a double quote:
awk -F'\t' '$2~/^"/{ORS=""}/"$/{ORS="\n"}1'
Demo: https://awk.js.org/?snippet=nEx499
To generalize this so that all multi-line columns enclosed in double quotes can be merged, you can set the output separator to an empty string upon an unterminated double quoted string, and set it to a newline again upon a terminating double quote:
awk '/"($|\t)/{ORS="\n"}/(^|\t)"[^\t"]*$/{ORS=""}1'