Search code examples
perlunicodecharacter-properties

How do I get a list of all Unicode characters that have a given property?


Without looping over the entire range of Unicode characters, how can I get a list of characters that have a given property? In particular I want a list of all characters that are digits (i.e. those that match /\d/). I have looked at Unicode::UCD, and it is useful for determining the properties of a given character, but there doesn't seem to be a way to get a list characters that have a property out of it.


Solution

  • The list of Unicode characters for each class is generated from the Unicode spec when you compile Perl, and is typically stored in /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/

    For example, the list of Unicode character ranges that match IsDigit (a.k.a. \d) is stored in the file /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/Digit.pl