Short story: I can't get an entity like '𠂉' to store in a MySQL database, either by using a text field in a Ruby on Rails app (with default UTF-8 encoding) or by inputting it directly with a MySQL GUI app.
As far as I can tell, all Chinese characters and radicals can be entered into the database without problem, but not these rarely typed 'character components.' The character mentioned above is unicode U+20089 and html entity 𠂉
I can get it to display on the page by entering <html>𠂉</html>
and removing html escaping, but I would like to store it simply as the unicode character and keep the html escaping in place. There are many other Chinese 'components' (parts of full characters, generally consisting of 2 or 3 strokes) that cause the same problem.
According to this page, the character mentioned is in the UTF-8 charset: http://www.fileformat.info/info/unicode/char/20089/charset_support.htm
But on the neighboring '...20089/index.htm' page, there's an alert saying it's not a valid unicode character.
For reference, that entity can be found in Mac OS X by searching through the character palette (international menu, "Show Character Palette"), searching by radical, and looking under the '丿' radical.
Apologies if this is too open-ended... can a character like this be stored in a UTF-8-based database? How is this character both supported and unsupported, both present in the character set and not valid?
Which version of MySQL are you using? If it's before 5.5, you can't store that character because it would take four bytes and MySQL only supports up to three bytes UTF-8 (i.e., characters in the BMP). MySQL 5.5 added support for four-byte UTF-8, but you have to specify utf8mb4
as the Character Set.
ref: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html