Search code examples
bashfilesortingawktext-processing

Sort rows within data blocks in a file. Move rows in one block to a different location in the data block based on index in a column


I have data in a file that is arranged as below. This shows only two data blocks/iterations.

     21 ! <-- This is the number of lines of data in the data block/iteration.  It never changes.
 Linkages. Iteration:1_1010 ! <-- This number does not always increase by 5 like in this example, but always increases.
  A         1.010      -3.582      -3.135
  B         0.730      -4.428      -3.854
  B        -3.883       4.671       0.010
  A        -0.223       2.522      -4.893
  B         2.769       4.634       0.179
  B        -2.024      -3.640      -1.032
  A         4.613       3.914       1.567
  B         2.746      -0.545       1.430
  B        -0.532       3.380      -2.107
  C         3.944       2.513      -5.172
  C        -4.669       1.056       2.747
  C         0.645       0.001      -3.737
  C        -2.875      -1.233      -0.538
  C         4.279      -5.187      -2.820
  C         1.067      -2.279       2.021
  C         2.667      -1.558       0.588
  C         3.628      -0.025       2.464
  C        -0.023       1.717       1.175
  C         0.925      -1.548       2.273
  C         1.152       2.914       1.039
  C         0.878      -0.445      -0.948
     21
 Linkages. Iteration:1_1015 
  A         1.010      -3.582      -3.135
  B         0.730      -4.428      -3.854
  B        -3.883       4.671       0.010
  A        -0.223       2.522      -4.893
  B         2.769       4.634       0.179
  B        -2.024      -3.640      -1.032
  A         4.613       3.914       1.567
  B         2.746      -0.545       1.430
  B        -0.532       3.380      -2.107
  C         3.944       2.513      -5.172
  C        -4.669       1.056       2.747
  C         0.645       0.001      -3.737
  C        -2.875      -1.233      -0.538
  C         4.279      -5.187      -2.820
  C         1.067      -2.279       2.021
  C         2.667      -1.558       0.588
  C         3.628      -0.025       2.464
  C        -0.023       1.717       1.175
  C         0.925      -1.548       2.273
  C         1.152       2.914       1.039
  C         0.878      -0.445      -0.948

What I need to do is redistribute the "C" lines. Specifically, I need to divide the "C" lines into blocks of four, then move the first block of C lines below the first set of "ABB" lines. Here is an example for one data block/iteration (I would like to do the exact same thing for all data blocks/iterations in the file):

    21 
Linkages. Iteration:1_1010
  A         1.010      -3.582      -3.135
  B         0.730      -4.428      -3.854
  B        -3.883       4.671       0.010
  C         3.944       2.513      -5.172
  C        -4.669       1.056       2.747
  C         0.645       0.001      -3.737
  C        -2.875      -1.233      -0.538
  A        -0.223       2.522      -4.893
  B         2.769       4.634       0.179
  B        -2.024      -3.640      -1.032
  C         4.279      -5.187      -2.820
  C         1.067      -2.279       2.021
  C         2.667      -1.558       0.588
  C         3.628      -0.025       2.464
  A         4.613       3.914       1.567
  B         2.746      -0.545       1.430
  B        -0.532       3.380      -2.107
  C        -0.023       1.717       1.175
  C         0.925      -1.548       2.273
  C         1.152       2.914       1.039
  C         0.878      -0.445      -0.948

I have been trying to do this in bash using "sort" but have not made much progress. I have found out that the general way to sort by a column index (like my first column) is to do this:

sort -n -k1 file

I also found this post (https://unix.stackexchange.com/questions/99582/sorting-blocks-of-lines) where the second answer uses "split" to split a file into blocks made of four lines:

split -a 6 -l 4 input_file my_prefix_

But I can't figure out how to move the four lines witih a data block/iteration. If anyone knows of a resource that explains this, it would be great to find out.


Solution

  • Using any awk in any shell on every Unix box:

    $ cat tst.awk
    $1 ~ /^[ABC]$/ {
        vals[++numVals] = $0
        next
    }
    {
        prtVals()
        print
    }
    END { prtVals() }
    
    function prtVals(       row,valNr,blocks,numBlocks,blockNr,numCs) {
        if ( numVals != 0 ) {
            for (valNr=1; valNr<=numVals; valNr++) {
                row = vals[valNr]
                split(row,f)
                if ( f[1] == "A" ) {
                    ++numBlocks
                }
                if ( f[1] == "C" ) {
                    if ( (++numCs % 4) == 1 ) {
                        blockNr++
                    }
                    blocks[blockNr] = blocks[blockNr] row ORS
                }
                else {
                    blocks[numBlocks] = blocks[numBlocks] row ORS
                }
            }
            for (blockNr=1; blockNr<=numBlocks; blockNr++) {
                printf "%s", blocks[blockNr]
            }
            delete vals
            numVals = 0
        }
    }
    

    $ awk -f tst.awk file
         21 ! <-- This is the number of lines of data in the data block/iteration.  It never changes.
     Linkages. Iteration:1_1010 ! <-- This number does not always increase by 5 like in this example, but always increases.
      A         1.010      -3.582      -3.135
      B         0.730      -4.428      -3.854
      B        -3.883       4.671       0.010
      C         3.944       2.513      -5.172
      C        -4.669       1.056       2.747
      C         0.645       0.001      -3.737
      C        -2.875      -1.233      -0.538
      A        -0.223       2.522      -4.893
      B         2.769       4.634       0.179
      B        -2.024      -3.640      -1.032
      C         4.279      -5.187      -2.820
      C         1.067      -2.279       2.021
      C         2.667      -1.558       0.588
      C         3.628      -0.025       2.464
      A         4.613       3.914       1.567
      B         2.746      -0.545       1.430
      B        -0.532       3.380      -2.107
      C        -0.023       1.717       1.175
      C         0.925      -1.548       2.273
      C         1.152       2.914       1.039
      C         0.878      -0.445      -0.948
         21
     Linkages. Iteration:1_1015
      A         1.010      -3.582      -3.135
      B         0.730      -4.428      -3.854
      B        -3.883       4.671       0.010
      C         3.944       2.513      -5.172
      C        -4.669       1.056       2.747
      C         0.645       0.001      -3.737
      C        -2.875      -1.233      -0.538
      A         1.010      -3.582      -3.135
      B         0.730      -4.428      -3.854
      B        -3.883       4.671       0.010
      C         3.944       2.513      -5.172
      C        -4.669       1.056       2.747
      C         0.645       0.001      -3.737
      C        -2.875      -1.233      -0.538
      A        -0.223       2.522      -4.893
      B         2.769       4.634       0.179
      B        -2.024      -3.640      -1.032
      C         4.279      -5.187      -2.820
      C         1.067      -2.279       2.021
      C         2.667      -1.558       0.588
      C         3.628      -0.025       2.464
      A        -0.223       2.522      -4.893
      B         2.769       4.634       0.179
      B        -2.024      -3.640      -1.032
      C         4.279      -5.187      -2.820
      C         1.067      -2.279       2.021
      C         2.667      -1.558       0.588
      C         3.628      -0.025       2.464
      A         4.613       3.914       1.567
      B         2.746      -0.545       1.430
      B        -0.532       3.380      -2.107
      C        -0.023       1.717       1.175
      C         0.925      -1.548       2.273
      C         1.152       2.914       1.039
      C         0.878      -0.445      -0.948
      A         4.613       3.914       1.567
      B         2.746      -0.545       1.430
      B        -0.532       3.380      -2.107
      C        -0.023       1.717       1.175
      C         0.925      -1.548       2.273
      C         1.152       2.914       1.039
      C         0.878      -0.445      -0.948