Search code examples
awksedpdb-filesprotein-database

Awk replace column AFTER matched line


I have a PDB file that is returned from a receptor/ligand docking prediction. I don't know why the authors of the program named the chains "A" for both receptor and ligand, but I want to change it. This should be a basic thing that I want to do and I am not sure why I cannot find any example on the internet. What I want to do is simple.

  1. Match a line, for example "HEADER lig"
  2. Then for every line after that replace column $5 with a "B"

Here is example of input file:

ATOM   9197  OG  SER A1176     103.395 152.201 139.176  1.00  0.00      RA2  O
ATOM   9198  HG  SER A1176     104.092 151.786 138.659  1.00  0.00      RA2  H
ATOM   9199  C   SER A1176     101.857 153.749 136.254  1.00  0.00      RA2  C
ATOM   9200  O   SER A1176     102.183 152.962 135.366  1.00  0.00      RA2  O
TER
HEADER lig.006.10.pdb
ATOM      1  N   GLY A  25     182.812 181.892 153.587  1.00  0.00      LA0  N
ATOM      2  H   GLY A  25     182.954 182.546 152.840  1.00  0.00      LA0  H
ATOM      3  CA  GLY A  25     183.834 180.858 153.715  1.00  0.00      LA0  C
ATOM      4  C   GLY A  25     184.544 180.646 152.391  1.00  0.00      LA0  C
ATOM      5  O   GLY A  25     184.450 181.466 151.487  1.00  0.00      LA0  O
ATOM      6  N   PRO A  26     185.249 179.494 152.297  1.00  0.00      LA0  N
ATOM      7  CD  PRO A  26     185.371 178.458 153.319  1.00  0.00      LA0  C

I tried this below but it only replaces column $5 for the first line after match. Not sure why nothing is posted on this example anywhere.

awk '{ print; } /^HEADER lig/ { getline; $5="B"; print }' model.006.10.pdb


Solution

  • awk '{ if (headerfound==1){ $5="B" }}/^HEADER/{ headerfound=1}{ print }'  mode.pdb
    

    Three parts:

    1. headerfound==1 ==> assing "B" to 5th column

    2. /^HEADER/ => does the line start with header?

    3. just a simple print to print the (eventually changed) line.

    A short explanation Part 2, detection of ^HEADER goeds after the check headerfound==1 because when the HEADER is found, the current line is the line containing the text HEADER, and we do not want to assign "B" to the 5th column of that line.

    On the next line, we first check if any previous line did contain a HEADER (headerfound==1), and update the $5.