I've extracted a column from an Excel spreadsheet where some cells contain multiple rows, some don't. The multi-row cells are delimited by quote marks. I want to replace the returns in those multi-row cells with ", ". So my file looks something like this:
"Alpha
Beta
Gamma"
123
456
"Apple
Banana
Cherry"
789
this is a single-line cell
My goal is
Alpha, Beta, Gamma
123
456
Apple, Banana, Cherry
789
this is a single-line cell
I can write a pattern that captures those multi-line cells using a nested pattern, but I don't know how to "reach inside" to get at the backreference.
"(([^"]+)\n)+([^"]+)"
Try:
$ sed '/^"/{:a; /"$/bb; N; ba; :b; s/\n//g}' file
"Alpha Beta Gamma"
123
456
"Apple Banana Cherry"
789
this is a single-line cell
/^"/{...}
For any lines that begins with "
, the commands in curly braces will be performed.
:a
This creates a label a
.
/"$/bb
If the current pattern space ends with a "
, then branch to label b
.
N
Read a new line into the pattern space.
ba
Branch back to label a
.
:b
This defines label b
.
s/\n//g
Remove any newline characters from the pattern space.