The reverse complement
of a DNA sequence is formed by interchanging the letters A
with T
, C
with G
(and vice versa) as a first step, and subsequently reversing the entire anagrammed sequence. For example, the reverse
of ACCTGAG is GAGTCCA (simply by reading the sequence from the end to the beginning), while the reverse complement
is CTCAGGT (by interchanging the letters first and then reversing the resulted sequence).
How can I achieve the desired reverse complement
output in SQL server? I provide the following SQL code for convenience.
DECLARE @seq NVARCHAR(MAX)
SET @seq = 'ACCTGAG'
SELECT @seq AS [Sequence], REVERSE(@seq) AS [Reverse sequence]
---output---
Sequence Reverse sequence
ACCTGAG GAGTCCA
---Desired output---
CTCAGGT
You can easily achieve this with TRANSLATE
and REVERSE
:
DECLARE @YourString varchar(20);
SET @YourString = 'ACCTGAG';
SELECT REVERSE(TRANSLATE(@YourString,'ACTG','TGAC'));
TRANSLATE
replaces each character in the second parameter with the character in the same ordinal position in the 3rd parameter. So for the above, A
would be replace by T
, C
with G
, T
with A
and G
and C
.
Importantly, as well, it will only process each character once, meaning that (for example), a character "translated" from A
to T
, will not later be translated back to A
in the above expression. Nested REPLACE
functions, however, would suffer that problem, so you would have to replace some with arbitrary characters first, and then replace them correct at the end.