Search code examples
perlalgorithmbioinformaticsdna-sequence

Generating Synthetic DNA Sequence with Substitution Rate


Given these inputs:

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

I want to generate:

  1. One thousand length-10 tags

  2. Substitution rate for each position in a tag is 0.003

Yielding output like:

AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags

Is there a compact way to do it in Perl?

I am stuck with the logic of this script as core:

#!/usr/bin/perl

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

    $i = 0;
    while ($i < length($init_seq)) {
        $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

        if ($roll == 1) {$base = A;}
        elsif ($roll == 2) {$base = T;}
        elsif ($roll == 3) {$base = C;}
        elsif ($roll == 4) {$base = G;};

        print $base;
    }
    continue {
        $i++;
    }

Solution

  • As a small optimisation, replace:

        $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4
    
        if ($roll == 1) {$base = A;}
        elsif ($roll == 2) {$base = T;}
        elsif ($roll == 3) {$base = C;}
        elsif ($roll == 4) {$base = G;};
    

    with

        $base = $dna[int(rand 4)];