Search code examples
perlpackunpack

perl quick switch from quaternary to decimal


I'm representing nucleotides A,C,G,T as 0,1,2,3, and afterwards I need to translate the sequence representing as quaternary to decimal. Is there a way to achieve this in perl? I'm not sure if pack/unpack can do this or not.


Solution

  • Base 4 requires exactly 2 bits, so it's easy to handle efficiently.

    my $uvsize = length(pack('J>', 0)) * 8;
    my %base4to2 = map { $_ => sprintf('%2b', $_) } 0..3;
    
    sub base4to10 {
       my ($s) = @_;
       $s =~ s/(.)/$base4to2{$1}/sg;
       $s = substr(("0" x $uvsize) . $s, -$uvsize);
       return unpack('J>', pack('B*', $s));
    }
    

    This allows inputs of 16 digits on builds supporting 32-bit integers, and 32 digits on builds supporting 64-bit integers.

    It's possible to support slightly larger numbers using floating points: 26 on builds with IEEE doubles, 56 on builds with IEEE quads. This would require a different implementation.

    Larger than that would require a module such as Math::BigInt for Perl to store them.


    Faster and simpler:

    my %base4to16 = (
       '0' => '0',   '00' => '0',   '20' => '8',
       '1' => '1',   '01' => '1',   '21' => '9',
       '2' => '2',   '02' => '2',   '22' => 'A',
       '3' => '3',   '03' => '3',   '23' => 'B',
                     '10' => '4',   '30' => 'C',
                     '11' => '5',   '31' => 'D',
                     '12' => '6',   '32' => 'E',
                     '13' => '7',   '33' => 'F',
    );
    
    sub base4to10 {
       (my $s = $_[0]) =~ s/(..?)/$base4to16{$1}/sg;
       return hex($s);
    }