Search code examples
sedterminalgrepposix

Sanitise code output from grep, replacing multiple whitespace after a range of characters


Answer: Thanks to Jerry Jeremiah I have the solution the end result is this:

grep -E '^\S{8} \S' test.lst | awk -F';' '{print substr($1,1,35)gensub("[[:space:]]+"," ","g",substr($1,36));}'

It requires having gawk installed

Original Question: I have a file which i want to sanitise the output and then diff however i'm having problems coming up with working regex to do what i want

Basically i want to ignore the first 36 characters then after that start with the first non white space character and replace all multiple white spaces with a single space and strip and line comment off the end which starts with a ; and remove any trailing whitespace

I just cant figure out how to get a pattern that works while ignoring those first 36 characters, any time i use a capture group like (\S*([^\s]\s+))* it will only ever return the last match

This is an example of the code i'm grepping into sed:

00000000 =00A00000                  z80_ram:        equ $A00000 ; start of Z80 RAM
00000000 =00A000EA                  z80_dac3_pitch:     equ $A000EA
00000000 =00A01FFD                  z80_dac_status:     equ $A01FFD
00000000 =00A01FFF                  z80_dac_sample:     equ $A01FFF
00000000 =00A02000                  z80_ram_end:        equ $A02000 ; end of non-reserved Z80 RAM
00000000 =00A10001                      z80_version:        equ $A10001
00000000 =00A10002                  z80_port_1_data:    equ $A10002
00000000 =00A10008                               z80_port_1_control:    equ $A10008
00000000 =00A1000A                  z80_port_2_control: equ $A1000A
00000000 =00A1000C                     z80_expansion_control:   equ $A1000C
00000000 =00A11100                  z80_bus_request:    equ $A11100
00000000 =00A11200                  z80_reset:      equ $A11200
00000000 =00A04000                  ym2612_a0:      equ $A04000
00000000 =00A04001                  ym2612_d0:      equ $A04001
00000000 =00A04002                  ym2612_a1:      equ $A04002
00000000 =00A04003                  ym2612_d1:      equ $A04003
00000000 =00A14000                         security_addr:       equ $A14000
00000214 6600                               bne.s   SkipSetup ; Skip the VDP and Z80 setup code if port A, B or C is ok...?
00000216 4BFA 0000                          lea SetupValues(pc),a5  ; Load setup values array address.
0000021A 4C9D 00E0                          movem.w (a5)+,d5-d7
0000021E 4CDD 1F00                          movem.l (a5)+,a0-a4
00000222 1029 EF01                          move.b  -$10FF(a1),d0   ; get hardware version (from $A10001)
00000226 0200 000F                          andi.b  #$F,d0
0000022A 6700                               beq.s   SkipSecurity    ; If the console has no TMSS, skip the security stuff.
0000022C 237C 5345 4741 2F00                move.l  #'SEGA',$2F00(a1) ; move "SEGA" to TMSS register ($A14000)

The output I want is this:

00000000 =00A00000                  z80_ram: equ $A00000
00000000 =00A000EA                  z80_dac3_pitch: equ $A000EA
00000000 =00A01FFD                  z80_dac_status: equ $A01FFD
00000000 =00A01FFF                  z80_dac_sample: equ $A01FFF
00000000 =00A02000                  z80_ram_end: equ $A02000
00000000 =00A10001                  z80_version: equ $A10001
00000000 =00A10002                  z80_port_1_data: equ $A10002
00000000 =00A10008                  z80_port_1_control: equ $A10008
00000000 =00A1000A                  z80_port_2_control: equ $A1000A
00000000 =00A1000C                  z80_expansion_control: equ $A1000C
00000000 =00A11100                  z80_bus_request: equ $A11100
00000000 =00A11200                  z80_reset: equ $A11200
00000000 =00A04000                  ym2612_a0: equ $A04000
00000000 =00A04001                  ym2612_d0: equ $A04001
00000000 =00A04002                  ym2612_a1: equ $A04002
00000000 =00A04003                  ym2612_d1: equ $A04003
00000000 =00A14000                  security_addr: equ $A14000
00000214 6600                       bne.s SkipSetup
00000216 4BFA 0000                  lea SetupValues(pc),a5
0000021A 4C9D 00E0                  movem.w (a5)+,d5-d7
0000021E 4CDD 1F00                  movem.l (a5)+,a0-a4
00000222 1029 EF01                  move.b -$10FF(a1),d0
00000226 0200 000F                  andi.b #$F,d0
0000022A 6700                       beq.s SkipSecurity
0000022C 237C 5345 4741 2F00        move.l #'SEGA',$2F00(a1)

Solution

  • You may use awk like:

    awk -F';' '{a=substr($1,1,35); b=substr($1,36); gsub("[[:space:]]+"," ",b);print a b;}' file > outfile
    

    See an online awk demo

    Details

    • -F';' - field separator set to ;
    • a=substr($1,1,35) - set an a variable equal to a (1,35) char substring of Field 1
    • b=substr($1,36) - set a b variable equal to a (36,) char substring of Field 1
    • gsub("[[:space:]]+"," ",b) - replace all chunks of 1 or more whitespace chars with a single regular space char in the b variable only
    • print a b - print concatenated a and b variable values.