Search code examples
perlemailutf-8base64

Why does Email::Stuffer base64-encode differently than MIME::Base64 and how does utf8 fit in?


I want to send a simple email with Email::Stuffer. As expected, it encodes headers with non-ascii characters as encoded words. But when when I decode them back (either in my mail client or in Perl), I get different text, and MIME::Base64 encodes the same text differently to start with.

use strict;
use warnings;
use Email::Stuffer;
use MIME::Base64;

my $text = 'Ümläut';
print "$text in base64: ", encode_base64($text, ''), "\n";
print "and back: ", decode_base64(encode_base64($text)), "\n";

my $stuffer = Email::Stuffer->subject($text);
my $dump = $stuffer->as_string();
print "Mail dump:\n---\n$dump\n---\n";

$dump =~ m{^Subject:\s*=\?UTF-8\?B\?(.+)\?=}m;
my $encoded = $1;
print "in Subject: $encoded\n";
my $decoded = decode_base64($encoded);
print "subject decoded: $decoded\n";

This prints:

Ümläut in base64: w5xtbMOkdXQ=
and back: Ümläut
Mail dump:
---
Date: Sat, 7 Oct 2023 16:31:59 -0500
MIME-Version: 1.0
Subject: =?UTF-8?B?w4PCnG1sw4PCpHV0?=


---
in Subject: w4PCnG1sw4PCpHV0
subject decoded: Ãmläut

(echo "Ümläut" | base64 on the shell agrees with MIME::Base64 and also prints w5xtbMOkdXQK)

The program source code is in utf8. When I add use utf8; after use warnings;, the first prints don't print the expected umlauts, but Email::Stuffer works as expected.

�ml�ut in base64: 3G1s5HV0
and back: �ml�ut
Mail dump:
---
Date: Sat, 7 Oct 2023 16:32:50 -0500
MIME-Version: 1.0
Subject: =?UTF-8?B?w5xtbMOkdXQ=?=


---
in Subject: w5xtbMOkdXQ=
subject decoded: Ümläut

What is the difference here / why does this happen and how can I get both MIME::Base64 and Email::Stuffer to agree?


Solution

  • ->subject expects text (decoded text, a string of Unicode Code Points).

    encode_base64 expects bytes (such as text encoded using UTF-8).

    Fixed:

    use strict;
    use warnings;
    use feature qw( say );
    
    use utf8;                               # Source code is encoded using UTF-8.
    use open ':std', ':encoding(UTF-8)';    # Terminal expects/provides UTF-8.
    
    use Email::Stuffer qw( );
    use Encode         qw( decode encode );
    use MIME::Base64   qw( decode_base64 encode_base64 );
    
    my $text_ucp = 'Ümläut';                # String of Unicode Code Points.
    say $text_ucp;                          # Ümläut
    
    my $text_utf8_base64 = encode_base64( encode( "UTF-8", $text_ucp ), '');
    say $text_utf8_base64;                  # w5xtbMOkdXQ=
    
    my $roundtrip_ucp = decode( "UTF-8", decode_base64( $text_utf8_base64 ) );
    say $roundtrip_ucp;                     # Ümläut
    
    my $stuffer = Email::Stuffer->subject( $text_ucp );
    print $stuffer->as_string();            # =?UTF-8?B?w5xtbMOkdXQ=?=