Split a DNA sequence into a list of codons with D

DNA strings consist of an alphabet of four characters, A,C,G, and T Given a string,

ATGTTTAAA

I would like to split it in to its constituent codons

ATG  TTT AAA  

   codons = ["ATG","TTT","AAA"]

codons encode proteins and they are redundant (http://en.wikipedia.org/wiki/DNA_codon_table)

I have a DNA string in D and would like to split it into a range of codons and later translate/map the codons to amino acids.

std.algorithm has a splitter function which requires a delimiter and also the std.regex Splitter function requires a regex to split the string. Is there an idiomatic approach to splitting a string without a delimiter?

Solution

Looks like you are looking for chunks:

import std.range : chunks;
import std.encoding : AsciiString;
import std.algorithm : map;

AsciiString ascii(string literal)
{
    return cast(AsciiString) literal;
}

void main()
{
    auto input = ascii("ATGTTTAAA");
    auto codons = input.chunks(3);
    auto aminoacids = codons.map!(
        (codon) {
            if (codon == ascii("ATG"))
                return "M";
            // ...
        }
    );
}

Please note that I am using http://dlang.org/phobos/std_encoding.html#.AsciiString here instead of plain string literals. This is to avoid costly UTF-8 decoding which is done for string and is never applicable to actual DNA sequence. I remember that making notable performance difference for similar bioinformatics code before.