Search code examples
javaunicodetranscoding

How to transcode EUC-JP to Unicode in a way that can be displayed to the user?


Hello I'm building a simple app for android which allows the user to look up a kanji and see its "parts". To do that I'm reading from a data file which I found on this website and contains the parts of kanji I need to display.

The problem is that its not encoded in unicode and I'm yet to find a program that can display the contents of the file properly. I'm not entirely sure what the encoding is but I suspect it to be CP932.

How can I transcode the file to something that can be displayed to the user (and manipulated by java)?

Here is a sample from the contents:

±ú : Ñá
±û : ¥Î °ì Âç ÑÌ
±ü : Âç ÊÆ ÑÄ
±ý : ²¦ Ц ×Æ
±þ : ¿´ Öø
²¡ : ¡Ã Æü Ù© ÅÄ
²¢ : ²¦ Æü
²£ : ¡Ã ²« ÅÄ Æó Æü ¥Ï ÌÚ ×°
²¤ : ·ç Ò¹ ¥Î Ц
²¥ : ¥Î Ц Ò¹ ÝÕ ÑÜ Ëô
²¦ : ²¦
²§ : ±© ¥Ï ÑÒ ÒÓ
²¨ : ½é Âç ÊÆ ÑÄ ÈÐ
²© : ¾° Ä» ÑÌ Û¿
²ª : Ä» Ò¹ Û¿ ¥Î Ц
²« : ²« ÅÄ ¥Ï
²¬ : Öõ ÑÄ °ì »³ ²¬
²­ : ¡Ã ½Á ¸ý
²® : ²Ð ÈÈ çè
²¯ : ²» ²½ ¿´ Æü Ω
²° : »ê ÅÚ ÒÓ Õù
²± : ²» ¿´ Æü Ë» Ω
²² : ²» ·î ¿´ Æü Ω
²³ : ÌÚ ÍÑ ¥Þ
²´ : µí ÅÚ
²µ : ²µ
²¶ : ²µ ²½ Âç ±â

Solution

  • The file is encoded in EUC-JP (not Shift-JIS or CP932). You can either convert the file to a UTF offline with a tool like iconv, or convert it online by specifying the charset when you create an InputStreamReader to read the file.