Search code examples
perlcharacter-encodingiso-8859-1

How to set the substitution character for encoding into ISO-8859-1


The documentation for the Encode module says this about handling malformed characters while encoding:

CHECK = Encode::FB_DEFAULT ( == 0)

If CHECK is 0, encoding and decoding replace any malformed character with a substitution character. When you encode, SUBCHAR is used.

How can I specify, or at least query, what the substitution characters is for a particular encoding. I'm interested in iso-8859-1.


Solution

  • You can't, but you can use a callback to achieve the same effect.

    $ perl -MEncode -E'say encode("iso-8859-1", "ab\x{2660}d\x{E9}f")' \
       | iconv -f iso-8859-1
    ab?déf
    
    $ perl -MEncode -E'say encode("iso-8859-1", "ab\x{2660}d\x{E9}f", sub { "*" })' \
       | iconv -f iso-8859-1
    ab*déf