Search code examples
phpencodingsanitization

getting rid of bold characters in a filename


mysql recently reported me the following error: [HY000][1366] Incorrect string value: '\xF0\x9D\x98\xBD\xF0\x9D...' for column 'name'

after investigation, I found that the value with weird characters comes from a filename, which apparently contains bold characters: 4 π˜½π˜Όπ™‰π˜Ώπ™€ π˜Όπ™‰π™‰π™Šπ™‰π˜Ύπ™€ - TV.mp4

Instead of changing the encoding of my database to accept such characters, i'd rather sanitize the value before inserting it, in PHP. But I have no idea which operation I should run to end with the following sanitized value : 4 BANDE ANNONCE - TV.mp4

Any help would be appreciated.


Solution

  • You can use the PHP iconv function to convert the string from one character encoding to another. In this case, you can try converting the string from UTF-8 to ASCII//TRANSLIT, which will attempt to transliterate any non-ASCII characters into their closest ASCII equivalents.

    Here's an example:

    function sanitize_string($input_string) {
        $sanitized_string = iconv("UTF-8", "ASCII//TRANSLIT", $input_string);
        return $sanitized_string;
    }
    
    $filename = "4 π˜½π˜Όπ™‰π˜Ώπ™€ π˜Όπ™‰π™‰π™Šπ™‰π˜Ύπ™€ - TV.mp4";
    $sanitized_filename = sanitize_string($filename);
    echo $sanitized_filename;
    

    This should output 4 BANDE ANNONCE - TV.mp4, which is the sanitized value you're looking for.