Search code examples
perlcjkkanjikatakana

Determine width in terminal of Asian/Japanese characters?


In my terminal these are equally wide:

ヌー平行
parallel
æøåüäöûß

same width of "ヌー平行" and "parallel" same width of "ヌ" and "p"

I have managed to get Perl to give the length 8 for the last 2 lines, but it reports the length of the first line as 4. Is there a way for me to determine that the width of ヌ is twice that of ø?


Solution

  • You can use Text::CharWidth's mbswidth. It uses POSIX's wcwidth.

    use v5.14;
    use warnings;
    
    use utf8;
    use open ':std', ':encoding(UTF-8)';
    
    use Encode             qw( encode_utf8 );
    use Text::CharWidth    qw( mbswidth );
    use Unicode::Normalize qw( NFC NFD );
    
    my @tests = (
       [ "ASCII",     "parallel",      8 ],
       [ "NFC",       NFC("æøåüäöûß"), 8 ],
       [ "NFD",       NFD("æøåüäöûß"), 8 ],
       [ "EastAsian", "ヌー平行",      8 ],
    );
    
    for ( @tests ) {
       my ( $name, $s, $expect ) = @$_;
       my $length = length( $s );
       my $got = mbswidth( encode_utf8( $s ) );
       printf "%-9s length=%2d expect=%d got=%d\n", 
          $name, $length, $expect, $got;
    }
    
    ASCII     length= 8 expect=8 got=8
    NFC       length= 8 expect=8 got=8
    NFD       length=13 expect=8 got=8
    EastAsian length= 4 expect=8 got=8
    

    Note that mbswidth expects a string encoded using the locale's encoding, which I assumed was UTF-8 in two places in the above program.


    If you want to know the number of column a string should take according to Unicode, this is covered by Unicode Standard Annex #11. Note that the answer may depend on whether one is in an East Asian context or not. For example, U+03A6 GREEK CAPITAL LETTER PHI ("Φ") takes up two columns in an East Asian Context, while it takes up only one otherwise.