Search code examples
awkconcatenationmatch

Print all lines but concatenate consecutive lines matching a pattern


I have a file that contains the following data:

;Citation1 begins here and contains characters including , . and numbers

DATA 1 259.85 101000 0.094837707 0.9089 / 
         2 266.07 101000 0.097842938 0.8997 / 
         3 270.95 101000 0.105071894 0.8899 / 
         4 273.35 101000 0.112016587 0.8849 / 
         5 278.75 101000 0.134569045 0.87 / 

;Citation2 begins here and contains characters including , . and numbers but
;this one continues on the next line

DATA 1 259.85 101000 0.094837707 0.9089 / 
         2 266.07 101000 0.097842938 0.8997 / 
         3 270.95 101000 0.105071894 0.8899 / 
         4 273.35 101000 0.112016587 0.8849 / 
         5 278.75 101000 0.134569045 0.87 / 

I would like to have all the lines printed into a new file. However, when consecutive lines begin with the same character (here ";"), I would like to concatenate them to the same line. The above input file would therefore appear as:

;Citation1 begins here and contains characters including , . and numbers

DATA 1 259.85 101000 0.094837707 0.9089 / 
         2 266.07 101000 0.097842938 0.8997 / 
         3 270.95 101000 0.105071894 0.8899 / 
         4 273.35 101000 0.112016587 0.8849 / 
         5 278.75 101000 0.134569045 0.87 / 

;Citation2 begins here and contains characters including , . and numbers but this one continues on the next line

DATA 1 259.85 101000 0.094837707 0.9089 / 
         2 266.07 101000 0.097842938 0.8997 / 
         3 270.95 101000 0.105071894 0.8899 / 
         4 273.35 101000 0.112016587 0.8849 / 
         5 278.75 101000 0.134569045 0.87 / 

I have tried using different variations of awk commands such as:

awk '/;/ && last {printf "%s","\n"last;$0}{printf "%s",$0}END{print} /;/{last=$0}' input.txt > output.txt

but have not been successful.


Solution

  • $ awk '
        {
            curr = $0
            printf "%s%s", ( (prev ~ /^;/) && sub(/^;/,"") ? OFS : ors ), $0
            ors = ORS
            prev = curr
        }
        END { print "" }
    ' file
    ;Citation1 begins here and contains characters including , . and numbers
    
    DATA 1 259.85 101000 0.094837707 0.9089 /
             2 266.07 101000 0.097842938 0.8997 /
             3 270.95 101000 0.105071894 0.8899 /
             4 273.35 101000 0.112016587 0.8849 /
             5 278.75 101000 0.134569045 0.87 /
    
    ;Citation2 begins here and contains characters including , . and numbers but this one continues on the next line
    
    DATA 1 259.85 101000 0.094837707 0.9089 /
             2 266.07 101000 0.097842938 0.8997 /
             3 270.95 101000 0.105071894 0.8899 /
             4 273.35 101000 0.112016587 0.8849 /
             5 278.75 101000 0.134569045 0.87 /