I've been having problems with "gremlins" from different encodings getting mixed into form input and data from a database within a Perl program. At first, I wasn't decoding, and smart quotes and similar things would generate multiple gibberish characters; but, blindly decoding everything as UTF-8 caused older Windows-1252 content to be filled with question marks.
So, I've used Encode::Detect::Detector and the decode() function to detect and decode all POST and GET input, along with data from a SQL database (the decoding process probably occurs on 10-20 strings of text each time a page is generated now). This seems to clean things up so UTF-8, ASCII and Windows-1252 content all display properly as UTF-8 output (as I've designated in the HTML headers):
my $encoding_name = Encode::Detect::Detector::detect($value);
eval { $value = decode($encoding_name, $value) };
My question is this: how resource heavy is this process? I haven't noticed a slowdown, so I think I'm happy with how this works, but if there's a more efficient way of doing this, I'd be happy to hear it.
The answer is highly application-dependent, so the acceptability of the 'expense' accrued is your call.
The best way to quantify the overhead is through profiling your code. You may want to give Devel::NYTProf
a spin.
Tim Bunce's YAPC::EU
presentation provide more details about the module.