I have a messy text file (about 30 Ko) containing data that I have to reorganize using a shell script. The file exibits a simple pattern : A "parameter number" (value between 10001 and 10999 ) is followed by several other values ( floats ). Values are separated by a space. I would like my file to be : on each line, a "parameter number" is followed by its values (only one "parameter number" in a line). Values are separated by a space.
My problem is easy to understand :
The "messy" file looks like that :
10001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n)
where x(i,j)
are floats
I would like it to be :
10001 x(1,1) x(1,2) ... x(1,n)
10002 x(2,1) x(2,2) ... x(2,n)
10003 x(3,1) x(3,2) ... x(3,n)
...
10999 x(999,1) x(999,2) ... x(999,n)
I would like to write a bash script (or a simple command) that replace the "space" before the pattern 10[0-9][0-9][0-9]
(regex) by a carriage return.
Bash script and regex are something new for me and can't figure an easy solution.
I am thinking about using the bash ${string//substring/newsubstring}
parameter expansion but I still don't know how to say "the space that precedes the pattern 10[0-9][0-9][0-9]
" in regex.
would like to write a bash script (or a simple command) that replace the "space" before the pattern 10[0-9][0-9][0-9] (regex) by a carriage return.
You could use sed.
sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n\1/g' file
OR
sed 's/ \(10[0-9][0-9][0-9]\)/\n\1/g' file
In basic sed, capturing group is represented by \(..\)
.
Example:
$ cat file
0001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n)
$ sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n\1/g' file
0001 x(1,1) x(1,2) ... x(1,n)
10002 x(2,1) x(2,2) ... x(2,n)
10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..]
10999 x(999,1) x(999,2) ... x(999,n)