Search code examples
character-encodingcjkiconvshift-jis

Halfwidth vs Fullwidth Forms in JIS X 208


I am trying to make sense of the following explanation on wikipedia page:

ASCII and JISCII punctuation (shown here with a yellow background) may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as Shift JIS, EUC-JP or ISO 2022-JP.

On my ubuntu system I can verify that:

$ printf "\x1b\x24\x42\x24\x22\x21\x40\n" | hexdump -C
00000000  1b 24 42 24 22 21 40 0a                           |.$B$"!@.|
00000008

Gives:

$ printf "\x1b\x24\x42\x24\x22\x21\x40\n" | iconv -f iso-2022-jp -t utf8
あ\

And we can verify the last character is U+FF3C FULLWIDTH REVERSE SOLIDUS using something like:

$ printf "\x1b\x24\x42\x24\x22\x21\x40\n" | iconv -f iso-2022-jp -t utf16be | hexdump -C
00000000  30 42 ff 3c 00 0a                                 |0B.<..|
00000006

My question is how do I switch to the "Halfwidth form" ? My question is about JIS X 208 and not iso-2022-jp, I'd like to understand what is the correct escape character to switch to halfwidth according to JIS X 208 specification.


Updated.

Since U+FF3C FULLWIDTH REVERSE SOLIDUS is an issue for JIS X 201 (¥). How about the case with U+FF0F FULLWIDTH SOLIDUS. Let's consider the following input: A/ア¥A/ア¥.

I would be tempted to say this is written as:

$ printf "\x1b\x24\x42\x23\x41\x21\x3f\x25\x22\x21\x6f\x1b\x28\x4a\x41\x2f\xb1\x5c" | hexdump -C
00000000  1b 24 42 23 41 21 3f 25  22 21 6f 1b 28 4a 41 2f  |.$B#A!?%"!o.(JA/|
00000010  b1 5c                                             |.\|
00000012

Could someone confirm that the only way to have Halfwidth form with JIS X 208 is to use an escape sequence to JIS X 201:

  • ESC 02/08 04/10 (ESC(J)

Solution

  • I wouldn't expect this to be possible. The halfwidth characters generally come out of how Shift JIS (and similar) interprets JIS X 201 characters. They're not actually part of the JIS X 208 character set. There isn't a "halfwidth mode" so much as a "JIS X 201 mode, which are generally taken to be halfwidth."

    JIS X 201 doesn't have a \ (REVERSE SOLIDUS) character. It was replaced by ¥ (YEN SIGN). (To the great frustration of C programmers....) So I wouldn't expect there to be a way to encode a hypothetical "HALFWIDTH REVERSE SOLIDUS" in JIS X 208. And it doesn't exist in Unicode (these are related facts).

    It might be possible to encode this in IBM Code Page 896 as 0x63, or in some of the other IBM Japanese code pages. But I wouldn't expect it in JIS X 208.