Search code examples
perldebugging

Printing utf-8 text in Perl debugger using x command


test file:

#!/usr/bin/perl

use utf8;
use strict;
use warnings;

my $var = 'Здравствуйте';

print $var;

We run it in the debugger:

$ perl -d ./test.pl
Loading DB routines from perl5db.pl version 1.55
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(./test.pl:7):    my $var = 'Здравствуйте';
  DB<1> p $var

Wide character in print at (eval 8)[/usr/share/perl/5.30/perl5db.pl:738] line 2.
 at (eval 8)[/usr/share/perl/5.30/perl5db.pl:738] line 2.
    eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
print {$DB::OUT}   $var;
' called at /usr/share/perl/5.30/perl5db.pl line 738
    DB::eval called at /usr/share/perl/5.30/perl5db.pl line 3138
    DB::DB called at ./test.pl line 9
Здравствуйте

  DB<2> x $var
0  '\x{0417}\x{0434}\x{0440}\x{0430}\x{0432}\x{0441}\x{0442}\x{0432}\x{0443}\x{0439}\x{0442}\x{0435}'
  1. How to get rid of the "Wide character in print" warning after the p command?
  2. How to make the debugger print the text instead of symbol codes when dumping a variable using x?

Solution

  • See this answer on setting the debuggers output handle to utf-8 which fixes the p command. (In short: call binmode($DB::OUT,':utf8') from inside the debugger)

    For the output of x the debugger requires the file dumpvar.pl and uses its dumpValue sub.

    Unfortunately the code replaces codepoints above 255:

    sub uniescape {
        join("",
         map { $_ > 255 ? sprintf("\\x{%04X}", $_) : chr($_) }
             unpack("W*", $_[0]));
    }
    

    For a workaround you can comment out the call to uniescape (line 102 in dumpvar.pl). Since the file is required you can also have a modified version in @INC, if you don't want to change the original one.

    Another possibility would be to redefine uniescape() after loading dumpvar.pl. The following code in your .perldb file does that:

    sub afterinit{
        binmode($DB::IN, ':utf8');
        binmode($DB::OUT, ':utf8');
        require 'dumpvar.pl';
        *dumpvar::uniescape = sub{return $_[0]};
    }
    

    If you want configurable behavior during a debugger session, use a flag to control uniescape():

    sub afterinit{
        binmode($DB::IN, ':utf8');
        binmode($DB::OUT, ':utf8');
        require 'dumpvar.pl';    
        *dumpvar::uniescape = sub{
            return $_[0] unless $DB::x_uniescape;
            join("",
                 map { $_ > 255 ? sprintf("\\x{%04X}", $_) : chr($_) }
                     unpack("W*", $_[0]));
        }
    }
    

    Then in the debugger:

    main::(test.pl:11): print $var;
      DB<1> x $var                                                                  
    0  'Здравствуйте'
      DB<2> $DB::x_uniescape = 1                                                    
    
      DB<3> x $var                                                                  
    0  '\x{0417}\x{0434}\x{0440}\x{0430}\x{0432}\x{0441}\x{0442}\x{0432}\x{0443}\x{0439}\x{0442}\x{0435}'
      DB<4> $DB::x_uniescape = 0                                                    
    
      DB<5> x $var                                                                  
    0  'Здравствуйте'