Search code examples
parsingddmd

Parsing a file with D


I am new in D and would like to parse a biological file of the form

>name1
acgcgcagagatatagctagatcg
aagctctgctcgcgct
>name2
acgggggcttgctagctcgatagatcga
agctctctttctccttcttcttctagagaga
>name2
gag ggagag

such that I can capture the 'headers' name1,name2,name3 with the corresponding 'sequence' data, the ..acgcg... stuff.

Now i have this.but it will only iterate line by line,

import std.stdio;
import std.stream;
import std.regex;


int main(string[] args){
  auto filename = args[1];
  auto entry_name = regex(r"^>(.*)"); //captures header only
  auto fasta_regex = regex(r"(\>.+\n)([^\>]+\n)"); //captures header and correponding sequence

  try {
    Stream file = new BufferedFile(filename);
    foreach(ulong n, char[] line; file) {
      auto name_capture = match(line,entry_name);
      writeln(name_capture.captures[1]);
    }

    file.close();
  }
  catch (FileException xy){
    writefln("Error reading the file: ");
  }

  catch (Exception xx){
    writefln("Exception occured: " ~ xx.toString());
  }
  return 0;
}

I would like to know a nice way of extracting the header and the sequence data such that I can create an associative array where each item corresponds to an entry in the file

[name1:acgcgcagagatatagctagatcgaagctctgctcgcgct,name2:acgggggcttgctagctcgatagatcgaagctctctttctccttcttcttctagagaga,.....]

Solution

  • the header is on it's own line right? so why not check for it and use an appender to allocate for the value

    auto current = std.array.appender!(char[]);
    string name;
    foreach(ulong n, char[] line; file) {
          auto entry = match(line,entry_name);
          if(entry){//we are in a header line
    
              if(name){//write what was caught 
                  map[name]=current.data.dup;//dup because .current.data is reused
              }
              name = entry.hit.idup;
              current.clear();
          }else{
              current.put(line);
          }
    }
    map[name]=current.data.dup;//remember last capture
    

    map is where you'll store the values (a string[string] will do)