java utf-8 keyboard non-printing-characters

How to get control characters from a console input string

I have looked through the suggested "already answered" questions for this. Mostly they want simply to discard such "non-printable" input. I want to use it.

I am getting a UTF8 String returned from keyboard input using

BufferedReader br = new BufferedReader( new InputStreamReader(System.in, 'UTF-8' ));
String response = br.readLine();

and I am interested in identifying whether the user has input, for example, up-arrow or down-arrow as one of their keystrokes.

Iterating through the chars in this String I find that down-arrow translates to (int value for char) 27, 91, 66, i.e. 3 chars. The first value corresponds to Escape. It seems therefore that this is not a matter of identifying a single Character and finding out whether it is non-printable.

Also I'm not clear why this control character can't be printed out as a single UTF8 character, but instead prints out as the 3 component parts of the UTF8 character: does this mean that when you iterate through a String you are in fact getting its contents byte-by-byte?

I just wonder if there is any documented or clever way of doing this (finding and identifying control characters) in a given UTF8 String. Perhaps Apache Commons. Or perhaps in Groovy (which I am in fact using, rather than Java)?

Solution

You can test for a real control character using the Character::isISOControl methods (javadoc).

However, as noted in the comments, up-arrow and down-arrow are keystrokes rather than characters. What they actually produce in the input stream are platform dependent. For example, if you are using an ANSI-compliant terminal or terminal emulator, an up-arrow will be mapped to the sequence ESC [ A. If you simply filter out the ISO control characters, you will remove the ESC only.

I don't think there is a reliable platform independent way to filter out the junk that results from a user mistakenly typing arrow keys. For a platform specific solution, you need to understand what specific sequences are produced by the user's input device. Then you detect and remove the sequences.