Search code examples
sqlsql-serverreverse

Reverse complement of a string in SQL Server


The reverse complement of a DNA sequence is formed by interchanging the letters A with T , C with G (and vice versa) as a first step, and subsequently reversing the entire anagrammed sequence. For example, the reverse of ACCTGAG is GAGTCCA (simply by reading the sequence from the end to the beginning), while the reverse complement is CTCAGGT (by interchanging the letters first and then reversing the resulted sequence).

How can I achieve the desired reverse complement output in SQL server? I provide the following SQL code for convenience.

DECLARE @seq NVARCHAR(MAX)
SET @seq = 'ACCTGAG'

SELECT @seq AS [Sequence], REVERSE(@seq) AS [Reverse sequence]

---output---
Sequence    Reverse sequence
ACCTGAG     GAGTCCA

---Desired output---
CTCAGGT

Solution

  • You can easily achieve this with TRANSLATE and REVERSE:

    DECLARE @YourString varchar(20);
    
    SET @YourString = 'ACCTGAG';
    
    SELECT REVERSE(TRANSLATE(@YourString,'ACTG','TGAC'));
    

    TRANSLATE replaces each character in the second parameter with the character in the same ordinal position in the 3rd parameter. So for the above, A would be replace by T, C with G, T with A and G and C.

    Importantly, as well, it will only process each character once, meaning that (for example), a character "translated" from A to T, will not later be translated back to A in the above expression. Nested REPLACE functions, however, would suffer that problem, so you would have to replace some with arbitrary characters first, and then replace them correct at the end.