I have a couple of files.
cat << EOF > file1.txt
20190130150000 #timestamp-1
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130151000 #timestamp-2
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130152000 #timestamp-3
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
EOF
cat << EOF > file2.txt
20190130150000 #timestamp-1
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130151000 #timestamp-2
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130152000 #timestamp-3
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
EOF
I am looking for a awk script to rearrange it in the the following way:
Write all the same timestamps together with its subsequent rows:
Please note: There is no such #timestamp words in the files, I just mentioned it to make understandble.
So the desire output will be like:
cat << EOF > out.txt
20190130150000
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130150000
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130151000
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130151000
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130152000
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130152000
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
EOF
My script:
awk '
{
if ($1 ~ /^[0-9]{14}$/) {
timestamp = $1
print timestamp
next
}
print
}' file1.txt file2.txt
This script is just printing the same inputs. It is not printing desire output.
Using the Decorate-Sort-Undecorate idiom with any awk+sort+cut for any number of input files of practically any size and containing any order and/or values of timestamps (so it'll work for 1 input file or 100000 input files where some have timestamp 1 before timestamp 2, some have timestamps that don't exist in any other file, massive input files, etc.)
$ cat tst.sh
#!/usr/bin/env bash
awk -v OFS='\t' '$1 ~ /^[0-9]{14}$/{t=$1} {print t, NR, $0}' "${@:--}" |
sort -k1,1n -k2,2n |
cut -f3-
$ ./tst.sh file{1,2}.txt
20190130150000 #timestamp-1
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130150000 #timestamp-1
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130151000 #timestamp-2
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130151000 #timestamp-2
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130152000 #timestamp-3
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130152000 #timestamp-3
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
That's doing the following:
awk -v OFS='\t' '...'
adds the most recent timestamp read from the input followed by the current line number in front of each input line so we can in step 2 sort on those to get the desired output order.sort -k1,1n -k2,2n
sorts by the timestamp and then by line number added in step 1 to group all lines by timestamp while retaining their original relative input order.cut -f3-
removes the leading timestamp and line number added by awk
in step 1 to get back to just the original content of each input line.As well as the conciseness and flexibility of the above approach, one additional benefit over trying to do it all in awk
or some other tool is that we're only storing 1 line at a time in memory using awk
so it'll work even for massive input files since only sort
has to handle all of the input "in memory" but it's designed to be able to do that by using demand paging, etc. while awk
is not and in fact awk
would slow down by orders of magnitude as it's array sizes increase if trying to store huge amounts of input.
If you're still not sure what the above is doing, just run the awk
script on it's own to see what it does (adds 2 fields to the front of the input):
$ awk -v OFS='\t' '$1 ~ /^[0-9]{14}$/{t=$1} {print t, NR, $0}' file{1,2}.txt
20190130150000 1 20190130150000 #timestamp-1
20190130150000 2 56 39
20190130150000 3 P A
20190130150000 4 S-4 B 4. E F 3
20190130150000 5 -9999 -9999 6.99 0.000 -9999 -9999
20190130150000 6 -9999 -9999 9.99 0.000 -9999 -9999
20190130151000 7 20190130151000 #timestamp-2
20190130151000 8 56 39
20190130151000 9 P A
20190130151000 10 S-4 B 4. E F 3
20190130151000 11 -9999 -9999 6.99 0.000 -9999 -9999
20190130151000 12 -9999 -9999 9.99 0.000 -9999 -9999
20190130152000 13 20190130152000 #timestamp-3
20190130152000 14 56 39
20190130152000 15 P A
20190130152000 16 S-4 B 4. E F 3
20190130152000 17 -9999 -9999 6.99 0.000 -9999 -9999
20190130152000 18 -9999 -9999 9.99 0.000 -9999 -9999
20190130150000 19 20190130150000 #timestamp-1
20190130150000 20 33 55.3
20190130150000 21 R A
20190130150000 22 S-4 B 3. E F 3
20190130150000 23 -9999 -9999 5.99 0.000 -9999 -9999
20190130150000 24 -9999 -9999 7.99 0.000 -9999 -9999
20190130151000 25 20190130151000 #timestamp-2
20190130151000 26 33 55.3
20190130151000 27 R A
20190130151000 28 S-4 B 3. E F 3
20190130151000 29 -9999 -9999 5.99 0.000 -9999 -9999
20190130151000 30 -9999 -9999 7.99 0.000 -9999 -9999
20190130152000 31 20190130152000 #timestamp-3
20190130152000 32 33 55.3
20190130152000 33 R A
20190130152000 34 S-4 B 3. E F 3
20190130152000 35 -9999 -9999 5.99 0.000 -9999 -9999
20190130152000 36 -9999 -9999 7.99 0.000 -9999 -9999
Then pipe that output to sort
to see what changes (lines reordered according to the numeric value of those 2 added fields):
$ awk -v OFS='\t' '$1 ~ /^[0-9]{14}$/{t=$1} {print t, NR, $0}' file{1,2}.txt |
sort -k1,1n -k2,2n
20190130150000 1 20190130150000 #timestamp-1
20190130150000 2 56 39
20190130150000 3 P A
20190130150000 4 S-4 B 4. E F 3
20190130150000 5 -9999 -9999 6.99 0.000 -9999 -9999
20190130150000 6 -9999 -9999 9.99 0.000 -9999 -9999
20190130150000 19 20190130150000 #timestamp-1
20190130150000 20 33 55.3
20190130150000 21 R A
20190130150000 22 S-4 B 3. E F 3
20190130150000 23 -9999 -9999 5.99 0.000 -9999 -9999
20190130150000 24 -9999 -9999 7.99 0.000 -9999 -9999
20190130151000 7 20190130151000 #timestamp-2
20190130151000 8 56 39
20190130151000 9 P A
20190130151000 10 S-4 B 4. E F 3
20190130151000 11 -9999 -9999 6.99 0.000 -9999 -9999
20190130151000 12 -9999 -9999 9.99 0.000 -9999 -9999
20190130151000 25 20190130151000 #timestamp-2
20190130151000 26 33 55.3
20190130151000 27 R A
20190130151000 28 S-4 B 3. E F 3
20190130151000 29 -9999 -9999 5.99 0.000 -9999 -9999
20190130151000 30 -9999 -9999 7.99 0.000 -9999 -9999
20190130152000 13 20190130152000 #timestamp-3
20190130152000 14 56 39
20190130152000 15 P A
20190130152000 16 S-4 B 4. E F 3
20190130152000 17 -9999 -9999 6.99 0.000 -9999 -9999
20190130152000 18 -9999 -9999 9.99 0.000 -9999 -9999
20190130152000 31 20190130152000 #timestamp-3
20190130152000 32 33 55.3
20190130152000 33 R A
20190130152000 34 S-4 B 3. E F 3
20190130152000 35 -9999 -9999 5.99 0.000 -9999 -9999
20190130152000 36 -9999 -9999 7.99 0.000 -9999 -9999
Then pipe to cut
to see what changes (first 2 fields removed):
$ awk -v OFS='\t' '$1 ~ /^[0-9]{14}$/{t=$1} {print t, NR, $0}' file{1,2}.txt |
sort -k1,1n -k2,2n |
cut -f3-
20190130150000 #timestamp-1
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130150000 #timestamp-1
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130151000 #timestamp-2
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130151000 #timestamp-2
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999
20190130152000 #timestamp-3
56 39
P A
S-4 B 4. E F 3
-9999 -9999 6.99 0.000 -9999 -9999
-9999 -9999 9.99 0.000 -9999 -9999
20190130152000 #timestamp-3
33 55.3
R A
S-4 B 3. E F 3
-9999 -9999 5.99 0.000 -9999 -9999
-9999 -9999 7.99 0.000 -9999 -9999