Search code examples
regexperlcharacter-class

How to print a Perl character class?


I was in a code review this morning and came across a bit of code that was wrong, but I couldn't tell why.

$line =~ /^[1-C]/;

This line was suppose to evaluate to a hex character between 1 and C, but I assume this line does not do that. The question is not what does match, but what does this match? Can I print out all characters in a character class? Something like below?

say join(', ', [1-C]);

Alas,

# Examples:
say join(', ', 1..9);
say join(', ', 'A'..'C');
say join(', ', 1..'C');

# Output
Argument "C" isn't numeric in range (or flop) at X:\developers\PERL\Test.pl line 33.

1, 2, 3, 4, 5, 6, 7, 8, 9
A, B, C

Solution

  • It matches every code point from U+0030 ("1") to U+0043 ("C").

    The simple answer is to use

    map chr, ord("1")..ord("C")
    

    instead of

    "1".."C"
    

    as you can see in the following demonstration:

    $ perl -Mcharnames=:full -E'
       say sprintf " %s  U+%05X %s", chr($_), $_, charnames::viacode($_)
          for ord("1")..ord("C");
    '
     1  U+00031 DIGIT ONE
     2  U+00032 DIGIT TWO
     3  U+00033 DIGIT THREE
     4  U+00034 DIGIT FOUR
     5  U+00035 DIGIT FIVE
     6  U+00036 DIGIT SIX
     7  U+00037 DIGIT SEVEN
     8  U+00038 DIGIT EIGHT
     9  U+00039 DIGIT NINE
     :  U+0003A COLON
     ;  U+0003B SEMICOLON
     <  U+0003C LESS-THAN SIGN
     =  U+0003D EQUALS SIGN
     >  U+0003E GREATER-THAN SIGN
     ?  U+0003F QUESTION MARK
     @  U+00040 COMMERCIAL AT
     A  U+00041 LATIN CAPITAL LETTER A
     B  U+00042 LATIN CAPITAL LETTER B
     C  U+00043 LATIN CAPITAL LETTER C
    

    If you have Unicode::Tussle installed, you can get the same output from the following shell command:

    unichars -au '[1-C]'
    

    You might be interested in wasting time browsing the Unicode code charts. (This particular range is covered by "Basic Latin (ASCII)".)