I have a problem. This is a small fragment of my input file
SOL168 MGD750
SOL259 MGD11
SOL363 MGD38
SOL168 MGD142
SOL363 MGD784
SOL660 MGD752
SOL440 MGD38
SOL440 MGD38
I need to count specific repetition. You can count, If in the first column in two different lines you have the same SOL and in the second column you have in one line MGD1-225, you must have in another line MGD 676-900 For example
SOL115 MGD201
SOL115 MGD782
and this count as one another example
SOL749 MGD751
SOL749 MGD111
In my input file, I will expect output
2
because SOL363 have bonds with MGD38(from the first layer) and also MGD784 (from the second layer) - first vertical water bridge
SOL168 have bonds with MGD750 (second layer) and MGD142(first layer)
Now it works, my whole script
#!/bin/bash
for index in {1..100} # I do this script on 100 files, that is s why I use for loop
do
awk '
BEGIN { FS = "MGD" }
$2 >= 1 && $2 <= 225 { layer1[$1]++ }
$2 >= 676 && $2 <= 900 { layer2[$1]++ }
END {
for (sql in layer1) {
if (layer1[sql] == 1 && layer2[sql] == 1)
++total
}
print total
}
' eq5_15_333_lipid_sol_fragment_$index.ndx >> vertical_water_bridges.txt
done
Using MGD
as your field separator, $2
becomes the numerical layer indicator and awk can express your problem statement pretty directly:
BEGIN { FS = "MGD" }
$2 >= 1 && $2 <= 225 { layer1[$1]++ }
$2 >= 676 && $2 <= 900 { layer2[$1]++ }
END {
total = 0
for (sql in layer1) {
if (sql in layer2)
++total
}
print total
}
$ awk -f a.awk file
2