Search code examples
unicodenullpointerexceptionfileinputstream

Null pointer Exception in the case of reading in emoticons


I have a text file that looks like this:

shooting-stars  💫  "are cool" 

I have a lexical analyzer that uses FileInputStream to read the characters one at a time, passing those characters to a switch statement that returns the corresponding lexeme.

In this case, 💫 represents assignment so this case passes:

case 'ð' :
            return new Lexeme("ASSIGN");

For some reason, the file reader stops at that point, returning a null pointer even though it has yet to process the string (or whatever is after the 💫). Any time it reads in an emoticon it does this. If there were no emoticons, it gets to the end of file. Any ideas?


Solution

  • I suspect the problem is that the character 💫 (Unicode code point U+1F4AB) is outside the range of characters that Java represents internally as single char values. Instead, Java represents characters above U+FFFF as two characters known as surrogate pairs, in this case U+D83D followed by U+DCAB. (See this thread for more info and some links.)

    It's hard to know exactly what's going on with the little bit of code that you presented, but my guess is that you are not handling this situation correctly. You will need to adjust your processing logic to deal with your emoticons arriving in two pieces.