Search code examples
perlutf8-decode

How To Detect Decoded String


I'm chasing a bug in Perl code that seems to fundamentally be a version of this:

"Cannot decode string with wide characters" appears on a weird place

Basically, under certain conditions, Encode::decode('utf8', $string) is getting called twice on the same string, and hilarity ensues. Now, the best solution is to figure out what conditions are causing the double-decode and stop that from happening. Unfortunately, this is mature production code for feature-rich product; figuring out those conditions and fixing them in a way that doesn't introduce regression errors looks to be challenging.

Is there some fast reliable way to detect whether a string has already been decoded from utf8? Inserting "if" statements before those calls feels a tad kludgy, but ought to be a pretty safe fix.


Solution

  • Encode has an is_utf8 function:

    is_utf8(STRING [, CHECK])

    [INTERNAL] Tests whether the UTF8 flag is turned on in the STRING. If CHECK is true, also checks the data in STRING for being well-formed UTF-8. Returns true if successful, false otherwise.

    Notice that the caption of the documentation is "Messing with Perl's Internals", this function might change in future perl versions.