Search code examples
c#javaencodingantlransi

How do I get this encoding right with ANTLR?


I'm working on a project for school. We are making a static code analyzer. A requirement for this is to analyse C# code in Java, which is going so far so good with ANTLR.

I have made some example C# code to scan with ANTLR in Visual Studio. I analyse every C# file in the solution. But it does not work. I am getting a memory leak and the error message :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.antlr.runtime.Lexer.emit(Lexer.java:151)
    at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
    at org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
    at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)

After a while I thought it was an issue with encoding, because all the files are in UTF-8. I think it can't read the encoded Stream. So i opened Notepad++ and i changed the encoding of every file to ANSI, and then it worked. I don't really understand what ANSI means, is this one character set or some kind of organisation?

I want to change the encoding from any encoding (probably UTF-8) to this ANSI encoding so i won't get memory leaks anymore.

This is the code that makes the Lexer and Parser:

InputStream inputStream = new FileInputStream(new File(filePath));
CharStream charStream = new ANTLRInputStream(inputStream);
CSharpLexer cSharpLexer = new CSharpLexer(charStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(cSharpLexer);
CSharpParser cSharpParser = new CSharpParser(commonTokenStream);
  • Does anyone know how to change the encoding of the InputStream to the right encoding?
  • And what does Notepad++ do when I change the encoding to ANSI?

Solution

  • I solved this issue by putting the ImputStream into a BufferedStream and then removed the Byte Order Mark.

    I guess my parser didn't like that encoding, because I also tried set the encoding explicitly.