I have a data, which is not formatted according to proper field separator (which is awk good at). But, what I know is, the data is fixed width.
NODE S1 S2 S3 SINT SEQV
1 0.14919 -0.58396E-001-0.71230 0.86149 0.77873
2 0.56037E-001 0.23261E-002-0.37154 0.42757 0.40341
3 0.52036E-001 0.19762E-001-0.27222 0.32426 0.30939
4 0.59765E-001 0.22059E-001-0.24529 0.30505 0.28806
5 0.70704E-001-0.51976E-002-0.13862 0.20932 0.18354
6 0.11906 0.44607E-001-0.17493 0.29399 0.26474
7 0.25540 0.95993E-002-0.43110 0.68650 0.60246
8 0.52246E-001-0.47008E-001-0.35167 0.40391 0.36456
9 0.32215E-001-0.62291E-001-0.28800 0.32021 0.28497
10 0.28072E-001-0.68269E-001-0.28304 0.31111 0.27586
11 0.25990E-001-0.78663E-001-0.28626 0.31225 0.27527
12 0.26657E-001-0.79217E-001-0.29507 0.32173 0.28400
The expected output is something like this (the numbers in other columns can have format like in S2, in other parts of the data):
NODE S1 S2 S3 SINT SEQV
1 0.14919 -0.58396E-001 -0.71230 0.86149 0.77873
2 0.56037E-001 0.23261E-002 -0.37154 0.42757 0.40341
3 0.52036E-001 0.19762E-001 -0.27222 0.32426 0.30939
4 0.59765E-001 0.22059E-001 -0.24529 0.30505 0.28806
5 0.70704E-001 -0.51976E-002 -0.13862 0.20932 0.18354
6 0.11906 0.44607E-001 -0.17493 0.29399 0.26474
7 0.25540 0.95993E-002 -0.43110 0.68650 0.60246
8 0.52246E-001 -0.47008E-001 -0.35167 0.40391 0.36456
9 0.32215E-001 -0.62291E-001 -0.28800 0.32021 0.28497
10 0.28072E-001 -0.68269E-001 -0.28304 0.31111 0.27586
11 0.25990E-001 -0.78663E-001 -0.28626 0.31225 0.27527
12 0.26657E-001 -0.79217E-001 -0.29507 0.32173 0.28400
The main problem is, sometimes the column are separated by space, sometime no space ('-' sign occupies the space), that is, no proper field separator. I found a similar question here, but that data was consistent in terms of field separator. There might be two ways, according to my thinking,
I'm new to AWK, I know it might be very easy with some other tools, but I would like to know if it's possible to separate/extract these columns using awk. I am using awk in terminal in MacOS.
Although the answers by JNevill & Andriy Makukha are fine in their own rights, I wanted to solve the problem specifically using NAWK (MacOS default AWK). I have found that "FIELDWIDTHS" is a gawk only feature Grymoire,machelp.
The catch is to use
FS=""
which would consider each character as a field. As the fields are of equal width, it's possible to extract each column when one knows the number of characters in each column. For example, if I want to extract the first, second, third and the last column, I can use the following code (if the source data is named as: test_input.txt):
awk 'BEGIN{FS=""}{print $1$2$3$4"\t"$5$6$7$8$9$10$11$12$13$14$15$16$17$18"\t"$19$20$21$22$23$24$25$26$27$28$29$30$31"\t"$59$60$61$62$63$64$65$66$67$68$69$70}' test_input.txt
output:
NODE S1 S2 SEQV
1 0.14919 -0.58396E-001 0.77873
2 0.56037E-001 0.23261E-002 0.40341
3 0.52036E-001 0.19762E-001 0.30939
4 0.59765E-001 0.22059E-001 0.28806
5 0.70704E-001 -0.51976E-002 0.18354
6 0.11906 0.44607E-001 0.26474
7 0.25540 0.95993E-002 0.60246
8 0.52246E-001 -0.47008E-001 0.36456
9 0.32215E-001 -0.62291E-001 0.28497
10 0.28072E-001 -0.68269E-001 0.27586
11 0.25990E-001 -0.78663E-001 0.27527
12 0.26657E-001 -0.79217E-001 0.28400
This is a simple but somewhat unsophisticated solution, but it works for me as of now, for large data. Any further refinement would be welcome...