Search code examples
linuxstringsplitoffsetdelimiter

Split each row in a file into substrings with delimiter based on fixed length


I need some help here to convert a file into a new file with below requirement:

  1. Split each row (long string) into sub-string based on fixed length
  2. use pipe delimiter "|" between each sub-string
  3. leave last undefined column (sub-string) as-is, but add "|" before it.

Here is example, suppose a file (test.dat) has 2 rows:

PG123ABCD A 000{000
MK789HJKL32H00

Column 1: length(2)
Column 2: length(3) 
Column 3: length(4)
Column 4: length(3)
Column 5: undefined, use all remaining value

Below is the final output I need. The example has only 2 rows, suppose I have a file that have 1k+ similar rows, and I need to convert original file to a new file based on above requirement.

PG|123|ABCD| A |000{000
MK|789|HJKL|32H|00

Solution

  • cut -b 1-2,3-5,6-9,10-12,13-500 --output-delimiter='|' test.dat > 1.dat
    

    I wrote above code and it output exactly what I need.

    The only question I have is last column, I used 13-500 as fixed length for the undefined column, however the length of the undefined remaining string varies in different rows, is there a generic way to define the last column's length? e.g., something like 13-max_lengh_of_the_row