im dealing with that quite some time now. I have characters from lets say the Latin alphabet and want them to be encoded in uppercase alpha strings only. Is there any module that could do this? Or any BaseX encoding that i can modify to just use uc alpha characters?
i currently have implemented parts of it using regex substitutions, but it only covers a few characters and is definetly not efficient :)
anyway if there is no way to deal with that via a module or function, is there any way to do this efficient via a regex?
i thought about a tr/[\+,\-,...]/[PLUS,MINUS,...]/cds;
but it seems like tr only substitutes char by char and not char by sequence of chars :(
any ideas?
achim
To answer the tr
question:
%subs = ( '+' => 'PLUS' );
my $pat = join '|', map quotemeta, keys %subs;
s/($pat)/$subs{$1}/g;
Base 26 is possible to do, but it's a bit hard and inefficient to implement since 26 is not a power of 2. But it's definitely what you want. I'll see about coding it up.
In the meantime, here's a base 16 solution:
sub bytes_to_base16 {
my $e = unpack('H*', $_);
$e =~ tr/0123456789ABCDEFabcdef/ABCDEFGHIJKLMNOPKLMNOP/;
return $e;
}
sub base16_to_bytes {
my $e = $_[0];
$e =~ tr/ABCDEFGHIJKLMNOP/0123456789ABCDEF/;
return pack('H*', $_);
}
Let's see how efficient base 26 is compared to base 16:
$ perl -MMath::BigInt -MMath::BigFloat -E'
my $n = Math::BigInt->new(1);
my $bs = 0;
for (1..10) {
$n <<= 8;
++$bs;
my $bd16 = 2*$bs;
my $bd26 = Math::BigFloat->new($n)->blog(26, 5)->bceil->numify;
say sprintf "%2d bytes takes %2d base16 digits or %2d base26 digits.".
" base26 is %3.0f%% of the size of base16.",
$bs, $bd16, $bd26, $bd26/$bd16*100;
}
'
1 bytes takes 2 base16 digits or 2 base26 digits. base26 is 100% of the size of base16.
2 bytes takes 4 base16 digits or 4 base26 digits. base26 is 100% of the size of base16.
3 bytes takes 6 base16 digits or 6 base26 digits. base26 is 100% of the size of base16.
4 bytes takes 8 base16 digits or 7 base26 digits. base26 is 88% of the size of base16.
5 bytes takes 10 base16 digits or 9 base26 digits. base26 is 90% of the size of base16.
6 bytes takes 12 base16 digits or 11 base26 digits. base26 is 92% of the size of base16.
7 bytes takes 14 base16 digits or 12 base26 digits. base26 is 86% of the size of base16.
8 bytes takes 16 base16 digits or 14 base26 digits. base26 is 88% of the size of base16.
9 bytes takes 18 base16 digits or 16 base26 digits. base26 is 89% of the size of base16.
10 bytes takes 20 base16 digits or 18 base26 digits. base26 is 90% of the size of base16.
An efficient implementation would produce slightly less efficient output.
$ perl -MMath::BigInt -MMath::BigFloat -E'
my $bs = 0;
for (1..10) {
++$bs;
my $bd16 = 2*$bs;
my $bd26 = int($bs/4)*7 + ($bs%4)*2;
say sprintf "%2d bytes takes %2d base16 digits or %2d base26 digits.".
" base26 is %3.0f%% of the size of base16.",
$bs, $bd16, $bd26, $bd26/$bd16*100;
}
'
1 bytes takes 2 base16 digits or 2 base26 digits. base26 is 100% of the size of base16.
2 bytes takes 4 base16 digits or 4 base26 digits. base26 is 100% of the size of base16.
3 bytes takes 6 base16 digits or 6 base26 digits. base26 is 100% of the size of base16.
4 bytes takes 8 base16 digits or 7 base26 digits. base26 is 88% of the size of base16.
5 bytes takes 10 base16 digits or 9 base26 digits. base26 is 90% of the size of base16.
6 bytes takes 12 base16 digits or 11 base26 digits. base26 is 92% of the size of base16.
7 bytes takes 14 base16 digits or 13 base26 digits. base26 is 93% of the size of base16.
8 bytes takes 16 base16 digits or 14 base26 digits. base26 is 88% of the size of base16.
9 bytes takes 18 base16 digits or 16 base26 digits. base26 is 89% of the size of base16.
10 bytes takes 20 base16 digits or 18 base26 digits. base26 is 90% of the size of base16.
Note that the efficient implementation uses an extra digits for inputs that are 7 bytes long.
So is it worth the effort of using base26 over base16? Probably not, unless each byte is really precious.
And finally, here's a base 26 implementation.
my @syms = ('A'..'Z');
my %syms = map { $syms[$_] => $_ } 0..$#syms;
sub bytes_to_base26 {
my $e = '';
my $full_blocks = int(length($_[0]) / 4);
for (0..$full_blocks-1) {
my $block = unpack('N', substr($_[0], $_*4, 4));
$e .= join '', @syms[
$block / 26**6 % 26,
$block / 26**5 % 26,
$block / 26**4 % 26,
$block / 26**3 % 26,
$block / 26**2 % 26,
$block / 26**1 % 26,
$block / 26**0 % 26,
];
}
my $extra = substr($_[0], $full_blocks*4);
for my $block (unpack('C*', $extra)) {
$e .= join '', @syms[
$block / 26**1 % 26,
$block / 26**0 % 26,
];
}
return $e;
}
sub base26_to_bytes {
my $d = '';
my $full_blocks = int(length($_[0]) / 7);
for (0..$full_blocks-1) {
my $block = 0;
$block = $block*26 + $syms{$_} for unpack '(a)*', substr($_[0], $_*7, 7);
$d .= pack('N', $block);
}
my $extra = substr($_[0], $full_blocks*7);
my @extra = unpack('(a)*', $extra);
while (@extra) {
my $block = 0;
$block = $block*26 + $syms{ shift(@extra) };
$block = $block*26 + $syms{ shift(@extra) };
$d .= pack('C', $block);
}
return $d;
}