Search code examples
stringperlbit-manipulation

Perl to set/reset 8th bit on string


Given a string, I want to

  • set all chars from A fo I's 8th bit on (0x41 ~ 0x49 to 0xC1 ~ 0xC9).
  • reset all chars that has the 8th bit on to off.

Like,

$s='@ABCDEFGHIJKLMNOPQRS';
$s1= join "", map { $_ |= 0x80 if /A-I/ } split //, $s;
$s2= join "", map { $_ &= ~0x80 } split //, $s1;

I think my above code is close, but it's not fully working.
Please help.


Solution

  • Your code has a few problems.

    Firstly, the function that you pass to map has to return the desired value. Your code is setting $_ to the desired value (which is fine, but unnecessary), but your code for $s1 isn't returning the result, so you end up with an empty string. (Your code for $s2 does return the result, so that one is fine in this respect, albeit written a bit strangely.)

    Secondly, /A-I/ is not the right regex. You meant to write /[A-I]/.

    Thirdly, when Perl converts between strings and numbers (e.g. because you've called a bitwise operator with a string and a number), it does so by parsing the string to a number or formatting the number in base-10. For example, '3' | 12 is equivalent to 3 | 12, i.e. 15, which then gets converted to '15' if necessary. That's not what you want; rather, you're interested in the ASCII/Unicode/byte value of characters in the string. For that sort of conversion, you need to use ord (character → number) and chr (number → character). But you can't write chr(~0x80), because ~0x80 is 0xFFFFFFFFFFFFFF7F (assuming a 64-bit system), which is not a valid character code. Instead, you need to write chr(0x7F), or "\x7F", or else apply the chr after the bitwise operation, by writing e.g. chr(ord($_) & ~0x80).

    So, putting it together, you can write this:

    $s = '@ABCDEFGHIJKLMNOPQRS';
    $s1 = join "", map { $_ | (/[A-I]/ ? "\x80" : "") } split //, $s;
    $s2 = join "", map { $_ & "\x7F" } split //, $s1;
    

    or this:

    $s = '@ABCDEFGHIJKLMNOPQRS';
    $s1 = join "", map { $_ = chr((ord $_) | 0x80) if /[A-I]/; $_ } split //, $s;
    $s2 = join "", map { chr(ord($_) & ~0x80) } split //, $s1;
    

    or any of various other permutations along those lines.