UTF16/32 Test Case (Need Negative Test Case)

I want/need a test case for testing/breaking conversions between UTF-32 and UTF-16.

For UTF-8 and UTF-16, I generally use the 'Chinese Bone' test: 0xE9 0xAA 0xA8 (UTF8) and 0x9AA8 (UTF16).

Does anyone have a negative test case that should break a poorly written implementation for UTF-16 and UTF-32? Ideally, the test will require use of at least two UTF-32 values.

Jeff

Solution

Not sure what you mean, here are some:

UTF-16

Lead surrogate with regular unit or another lead surrogate following \xD8\x00\x00\x00 or \xD8\x00\xDB\xFF
Trail surrogate without lead surrogate before it \x00\x61\xDC\00
Trail surrogate in lead position \xDF\xFF\xDB\xFF
Lead surrogate as last unit \xD8\x01<EOF>
Lead surrogate as last unit, followed by a half trail surrogate. This bug exists in python 2.7.3: '\xD8\x00\xDC'.decode('utf-16be')

UTF-32

Unit value returns true for value < 0, value > 0x10FFFF or 0xD800 <= value && value <= 0xDFFF