Search code examples
bashgrepcat

bash: cat + grep to produce several replicas merging two filles


Using Linux bash command line, I need to merge two filles integrating several copies of the file1 inside the specified part of the file 2. The file 1 looks like:

ATOM      1  N   SER A   1      -2.390   4.343 -17.003  1.00 27.76           N1+
ATOM      2  CA  SER A   1      -2.066   5.647 -16.370  1.00 27.12           C  
ATOM      3  C   SER A   1      -2.394   5.608 -14.874  1.00 26.29           C  
ATOM      4  O   SER A   1      -3.014   4.627 -14.405  1.00 22.93           O  
ATOM      5  CB  SER A   1      -2.771   6.798 -17.057  1.00 28.10           C  
ATOM      6  OG  SER A   1      -2.538   8.023 -16.373  1.00 32.02           O  
ATOM      7  N   GLY A   2      -1.982   6.655 -14.162  1.00 25.31           N  
ATOM      8  CA  GLY A   2      -2.172   6.779 -12.716  1.00 24.93           C  
ATOM      9  C   GLY A   2      -0.888   6.336 -12.067  1.00 23.66           C  
ATOM     10  O   GLY A   2      -0.168   5.459 -12.608  1.00 27.42           O  
ATOM     11  N   PHE A   3      -0.636   6.866 -10.900  1.00 22.07           N  
ATOM     12  CA  PHE A   3       0.622   6.595 -10.191  1.00 21.70           C  
ATOM     13  C   PHE A   3       0.279   6.570  -8.716  1.00 20.39           C  
ATOM     14  O   PHE A   3      -0.265   7.544  -8.167  1.00 23.83           O  

the file 2 is a multi-block, where separate parts are defined by model1,model 2, model N and separated by ENDMDL:

MODEL 1
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ENDMDL
MODEL 2
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ENDMDL
MODEL 3
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ENDMDL

I need to copy several times all the containt of the file 1 into the file 2 just before the separator ENDMDL (in the second file), thus integrating several coppies of the file 1 into the file 2. Here is the example of expected output:

MODEL 1
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ATOM      1  N   SER A   1      -2.390   4.343 -17.003  1.00 27.76           N1+
ATOM      2  CA  SER A   1      -2.066   5.647 -16.370  1.00 27.12           C  
ATOM      3  C   SER A   1      -2.394   5.608 -14.874  1.00 26.29           C  
ATOM      4  O   SER A   1      -3.014   4.627 -14.405  1.00 22.93           O  
ATOM      5  CB  SER A   1      -2.771   6.798 -17.057  1.00 28.10           C  
ATOM      6  OG  SER A   1      -2.538   8.023 -16.373  1.00 32.02           O  
ATOM      7  N   GLY A   2      -1.982   6.655 -14.162  1.00 25.31           N  
ATOM      8  CA  GLY A   2      -2.172   6.779 -12.716  1.00 24.93           C  
ATOM      9  C   GLY A   2      -0.888   6.336 -12.067  1.00 23.66           C  
ATOM     10  O   GLY A   2      -0.168   5.459 -12.608  1.00 27.42           O  
ATOM     11  N   PHE A   3      -0.636   6.866 -10.900  1.00 22.07           N  
ATOM     12  CA  PHE A   3       0.622   6.595 -10.191  1.00 21.70           C  
ATOM     13  C   PHE A   3       0.279   6.570  -8.716  1.00 20.39           C  
ATOM     14  O   PHE A   3      -0.265   7.544  -8.167  1.00 23.83           O 
ENDMDL
MODEL 2
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ATOM      1  N   SER A   1      -2.390   4.343 -17.003  1.00 27.76           N1+
ATOM      2  CA  SER A   1      -2.066   5.647 -16.370  1.00 27.12           C  
ATOM      3  C   SER A   1      -2.394   5.608 -14.874  1.00 26.29           C  
ATOM      4  O   SER A   1      -3.014   4.627 -14.405  1.00 22.93           O  
ATOM      5  CB  SER A   1      -2.771   6.798 -17.057  1.00 28.10           C  
ATOM      6  OG  SER A   1      -2.538   8.023 -16.373  1.00 32.02           O  
ATOM      7  N   GLY A   2      -1.982   6.655 -14.162  1.00 25.31           N  
ATOM      8  CA  GLY A   2      -2.172   6.779 -12.716  1.00 24.93           C  
ATOM      9  C   GLY A   2      -0.888   6.336 -12.067  1.00 23.66           C  
ATOM     10  O   GLY A   2      -0.168   5.459 -12.608  1.00 27.42           O  
ATOM     11  N   PHE A   3      -0.636   6.866 -10.900  1.00 22.07           N  
ATOM     12  CA  PHE A   3       0.622   6.595 -10.191  1.00 21.70           C  
ATOM     13  C   PHE A   3       0.279   6.570  -8.716  1.00 20.39           C  
ATOM     14  O   PHE A   3      -0.265   7.544  -8.167  1.00 23.83           O 
ENDMDL
MODEL 3
REMARK VINA RESULT:    -7.828      0.000      0.000
REMARK INTER + INTRA:         -13.769
REMARK INTER:                 -10.110
REMARK INTRA:                  -3.659
REMARK UNBOUND:                -3.196
ATOM      1  N   SER A   1      -2.390   4.343 -17.003  1.00 27.76           N1+
ATOM      2  CA  SER A   1      -2.066   5.647 -16.370  1.00 27.12           C  
ATOM      3  C   SER A   1      -2.394   5.608 -14.874  1.00 26.29           C  
ATOM      4  O   SER A   1      -3.014   4.627 -14.405  1.00 22.93           O  
ATOM      5  CB  SER A   1      -2.771   6.798 -17.057  1.00 28.10           C  
ATOM      6  OG  SER A   1      -2.538   8.023 -16.373  1.00 32.02           O  
ATOM      7  N   GLY A   2      -1.982   6.655 -14.162  1.00 25.31           N  
ATOM      8  CA  GLY A   2      -2.172   6.779 -12.716  1.00 24.93           C  
ATOM      9  C   GLY A   2      -0.888   6.336 -12.067  1.00 23.66           C  
ATOM     10  O   GLY A   2      -0.168   5.459 -12.608  1.00 27.42           O  
ATOM     11  N   PHE A   3      -0.636   6.866 -10.900  1.00 22.07           N  
ATOM     12  CA  PHE A   3       0.622   6.595 -10.191  1.00 21.70           C  
ATOM     13  C   PHE A   3       0.279   6.570  -8.716  1.00 20.39           C  
ATOM     14  O   PHE A   3      -0.265   7.544  -8.167  1.00 23.83           O 
ENDMDL

I have tried to use cat BUT it just fused the both files together without the required replication of the first file:

cat file1.pdb file2.pdb > together.pdb

Need I pipe this to some expression of grep in order to replicate the file1 in the positions before the ENDMDL of the file 2 ?


Solution

  • Here is an awk solution that doesn't call unsafe system or getline:

    awk 'NR==FNR {s = s $0 ORS; next} $0 == "ENDMDL" {$0 = s $0} 1' file1 file2
    

    If you want to pass shell variable names then use:

    awk 'NR==FNR {s = s $0 ORS; next}
    $0 == "ENDMDL" {$0 = s $0} 1' "$file1" "$file2"