Search code examples
regexbashshellunixcarriage-return

insert carriage return before a regular expression in a text file using unix shell


I have a messy text file (about 30 Ko) containing data that I have to reorganize using a shell script. The file exibits a simple pattern : A "parameter number" (value between 10001 and 10999 ) is followed by several other values ( floats ). Values are separated by a space. I would like my file to be : on each line, a "parameter number" is followed by its values (only one "parameter number" in a line). Values are separated by a space.

My problem is easy to understand :

The "messy" file looks like that :

10001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n) 

where x(i,j) are floats

I would like it to be :

10001 x(1,1) x(1,2) ... x(1,n) 
10002 x(2,1) x(2,2) ... x(2,n) 
10003 x(3,1) x(3,2) ... x(3,n) 
...
10999 x(999,1) x(999,2) ... x(999,n) 

I would like to write a bash script (or a simple command) that replace the "space" before the pattern 10[0-9][0-9][0-9] (regex) by a carriage return.

Bash script and regex are something new for me and can't figure an easy solution.

I am thinking about using the bash ${string//substring/newsubstring} parameter expansion but I still don't know how to say "the space that precedes the pattern 10[0-9][0-9][0-9]" in regex.


Solution

  • would like to write a bash script (or a simple command) that replace the "space" before the pattern 10[0-9][0-9][0-9] (regex) by a carriage return.

    You could use sed.

    sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n\1/g' file
    

    OR

    sed 's/ \(10[0-9][0-9][0-9]\)/\n\1/g' file
    

    In basic sed, capturing group is represented by \(..\).

    Example:

    $ cat file
    0001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n) 
    $ sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n\1/g' file
    0001 x(1,1) x(1,2) ... x(1,n)
    10002 x(2,1) x(2,2) ... x(2,n)
    10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..]
    10999 x(999,1) x(999,2) ... x(999,n)