I am new in D and would like to parse a biological file of the form
gag ggagag
such that I can capture the 'headers' name1,name2,name3 with the corresponding 'sequence' data, the ..acgcg... stuff.
Now i have this.but it will only iterate line by line,
import std.stdio;
import std.stream;
import std.regex;
int main(string[] args){
auto filename = args[1];
auto entry_name = regex(r"^>(.*)"); //captures header only
auto fasta_regex = regex(r"(\>.+\n)([^\>]+\n)"); //captures header and correponding sequence
try {
Stream file = new BufferedFile(filename);
foreach(ulong n, char[] line; file) {
auto name_capture = match(line,entry_name);
catch (FileException xy){
writefln("Error reading the file: ");
catch (Exception xx){
writefln("Exception occured: " ~ xx.toString());
return 0;
I would like to know a nice way of extracting the header and the sequence data such that I can create an associative array where each item corresponds to an entry in the file
the header is on it's own line right? so why not check for it and use an appender to allocate for the value
auto current = std.array.appender!(char[]);
string name;
foreach(ulong n, char[] line; file) {
auto entry = match(line,entry_name);
if(entry){//we are in a header line
if(name){//write what was caught
map[name]=current.data.dup;//dup because .current.data is reused
name = entry.hit.idup;
map[name]=current.data.dup;//remember last capture
map is where you'll store the values (a string[string]
will do)