Search code examples
linuxawksedtr

Format text with sed or awk


Am trying to format the below actual output to get in the same line for each disks

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347c@4/e,487c@0/disk@1
   /dev/chassis/SYS/DBP/HDD0/NVME/disk
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
   /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1
   /dev/chassis/DBP/HDD1/NVME/disk
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>
   /pci@0,0/pci8e,4872@17/disk@0,0
   /dev/chassis/MB/SSDR0/SSD0/disk
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>
   /pci@0,0/pci08e,4872@17/disk@2,0
   /dev/chassis/SYS/MB/SSDR0/SSD1/disk

Trying to get the expected output like below,

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|
3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

I tried with below,

cat actual_output | tr -s " " | tr "\n" "|"

Which is resulting all in single line,

0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|2. c3t0d0 <ATA-Micron_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@0,0| /dev/chassis/MB/SSDR0/SSD0/disk|3. c4t2d0 <ATA-Micron_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci108e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|

Now need to replace 0. until next 1. contents with newline(\n), so that will get expected result. Do we have any regex to do the same?

TIA


Solution

  • Modifying one data set to have only 2 lines:

    $ cat disk.dat
    0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
       /pci@4,0/pci8086,347c@4/e,487c@0/disk@1
       /dev/chassis/SYS/DBP/HDD0/NVME/disk
    1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>
       /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1
       /dev/chassis/DBP/HDD1/NVME/disk
    2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>
       /pci@0,0/pci8e,4872@17/disk@0,0
    3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>
       /pci@0,0/pci08e,4872@17/disk@2,0
       /dev/chassis/SYS/MB/SSDR0/SSD1/disk
    

    Extending OP's current code:

    cat disk.dat | tr -s " " | tr "\n" "|" | sed -E "s/\|([0-9])/\|\n\1/g; s/$/\n/"
    

    Where:

    • the 1st half of the sed script places a \n between a pipe (|) and a number ([0-9])
    • the 2nd half of the sed script adds a \n at the end of the line

    An alternative awk idea:

    awk -F'.' '                                        # input field delimiter is a period
               { sub(/[[:space:]]+/,"",$1) }           # remove leading white space from 1st field
    ($1+0)==$1 { if (NR>1) print ""; pfx="" }          # if 1st field is numeric; if beyond 1st row then terminate previous line of output; reset prefix to empty string
               { printf "%s%s|", pfx, $0; pfx=" " }    # print prefix plus current line; reset prefix to a single space
    END        { if (NR>=1) print "" }                 # if we had at least one row of input then terminate previous line of output
    ' disk.dat
    

    Both of these generate:

    0. ct1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347c@4/e,487c@0/disk@1| /dev/chassis/SYS/DBP/HDD0/NVME/disk|
    1. c2t1d0 <INTEL-ADDPF2KX076T9S-2CV1-6.19TB>| /pci@4,0/pci8086,347d@5/apci108e,487c@0/disk@1| /dev/chassis/DBP/HDD1/NVME/disk|
    2. c3t0d0 <ATA-Min_5300_MAAAD-D3MU-223.57GB>| /pci@0,0/pci8e,4872@17/disk@0,0|
    3. c4t2d0 <ATA-Min_5300_MTFD-D3MU-223.57GB>| /pci@0,0/pci08e,4872@17/disk@2,0| /dev/chassis/SYS/MB/SSDR0/SSD1/disk|