I made my own grammar in order to parse chess PGN files, it compiles fine (with antlr4 command) but I can't manage to parse pgn files with it.
Pgn.g4 (antlr4 grammar, available here)
grammar Pgn;
file: game (NEWLINE+ game)*;
game: (tag+ NEWLINE+)? notation;
tag: '['TAG_TYPE TAG_VALUE']';
notation: move+ END_RESULT?;
move: MOVE_NUMBER\. MOVE_DESC MOVE_DESC #CompleteMove
| MOVE_NUMBER'.' MOVE_DESC #OnlyWhiteMove
| MOVE_NUMBER'...' MOVE_DESC #OnlyBlackMove
| MOVE_NUMBER\. MOVE_DESC MOVE_DESC '(' move+ ')' #CompleteMoveWithVariant
| MOVE_NUMBER'.' MOVE_DESC #OnlyWhiteMoveWithVariant
| MOVE_NUMBER'...' MOVE_DESC #OnlyBlackMoveWithVariant
;
END_RESULT: '1-0'
| '0-1'
| '1/2-1/2'
| '*'
;
TAG_TYPE: LETTER+;
TAG_VALUE: '"'[:print:]*'"';
MOVE_NUMBER: DIGIT+;
MOVE_DESC: [:print:];
NEWLINE: '\r'? '\n';
SPACES: [ \t]+ -> skip;
fragment LETTER: [a-zA-Z];
fragment DIGIT: [0-9];
My test file (Launcher.java) :
package com.gmail.bernabe.laurent.j2se.parsing_pgn_test;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.swing.JFileChooser;
import javax.swing.filechooser.FileNameExtensionFilter;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import com.gmail.bernabe.laurent.j2se.parsing_pgn_test.pgn.PgnLexer;
import com.gmail.bernabe.laurent.j2se.parsing_pgn_test.pgn.PgnParser;
public class Launcher {
public static void main(String[] args) throws FileNotFoundException, IOException {
JFileChooser fileChooser = new JFileChooser();
fileChooser.setAcceptAllFileFilterUsed(false);
fileChooser.addChoosableFileFilter(new FileNameExtensionFilter(
"Portable Game Notation (*.pgn)", new String[]{"pgn"}));
if (fileChooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION){
ANTLRInputStream inStream = new ANTLRInputStream(
new FileInputStream(fileChooser.getSelectedFile())
);
PgnLexer lexer = new PgnLexer(inStream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
PgnParser parser = new PgnParser(tokenStream);
ParseTree tree = parser.file();
System.out.println(tree.toStringTree(parser));
}
}
}
I tested with two samples pgn (generated with chessX program), but I made 4 for the sake of completeness : with DebutUltraSimple.pgn and with FinaleUltraSimple.pgn (remainings test pgn are Scandinave.pgn and test.pgn).
The error output given by DebutUltraSimple.pgn :
line 1:7 token recognition error at: '"?'
line 1:9 token recognition error at: '"]'
line 1:11 mismatched input '\n' expecting TAG_VALUE
line 2:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 2:6 token recognition error at: '"?'
line 2:8 token recognition error at: '"]'
line 3:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 3:6 token recognition error at: '"?'
line 3:8 token recognition error at: '?'
line 3:9 token recognition error at: '?'
line 3:10 token recognition error at: '?'
line 3:12 token recognition error at: '?'
line 3:13 token recognition error at: '?'
line 3:15 token recognition error at: '?'
line 3:16 token recognition error at: '?'
line 3:17 token recognition error at: '"]'
line 4:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 4:7 token recognition error at: '"?'
line 4:9 token recognition error at: '"]'
line 5:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 5:7 token recognition error at: '"?'
line 5:9 token recognition error at: '"]'
line 6:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 6:7 token recognition error at: '"?'
line 6:9 token recognition error at: '"]'
line 7:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 7:8 token recognition error at: '"*'
line 7:10 token recognition error at: '"]'
line 8:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 8:5 token recognition error at: '"C'
line 8:9 token recognition error at: '"]'
line 10:3 no viable alternative at input '1.e'
(file (game (tag [ Event) \n [ Site \n [ Date . . \n [ Round \n [ White \n [ Black \n [ Result \n [ ECO (notation move (move 40))) \n \n (game (notation move (move 1 . e) move (move 4 e) move (move 5) (move 2 . Nf) move (move 3 Nc) move (move 6) *)))
And the error output given by FinaleUltraSimple.pgn :
line 1:7 token recognition error at: '"tra'
line 1:16 token recognition error at: '"]'
line 1:11 mismatched input 'ining' expecting TAG_VALUE
line 2:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 2:6 token recognition error at: '"?'
line 2:8 token recognition error at: '"]'
line 3:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 3:6 token recognition error at: '"?'
line 3:8 token recognition error at: '?'
line 3:9 token recognition error at: '?'
line 3:10 token recognition error at: '?'
line 3:12 token recognition error at: '?'
line 3:13 token recognition error at: '?'
line 3:15 token recognition error at: '?'
line 3:16 token recognition error at: '?'
line 3:17 token recognition error at: '"]'
line 4:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 4:7 token recognition error at: '"?'
line 4:9 token recognition error at: '"]'
line 5:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 5:7 token recognition error at: '"w'
line 5:13 token recognition error at: '_'
line 5:21 token recognition error at: '"]'
line 6:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 6:7 token recognition error at: '"b'
line 6:13 token recognition error at: '_'
line 6:21 token recognition error at: '"]'
line 7:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 7:8 token recognition error at: '"1'
line 7:10 token recognition error at: '/'
line 7:12 token recognition error at: '-'
line 7:14 token recognition error at: '/'
line 7:16 token recognition error at: '"]'
line 8:5 token recognition error at: '"4'
line 8:7 mismatched input 'k' expecting TAG_VALUE
line 8:9 token recognition error at: '/'
line 8:11 token recognition error at: '/'
line 8:16 token recognition error at: '/'
line 8:18 token recognition error at: '/'
line 8:20 token recognition error at: '/'
line 8:22 token recognition error at: '/'
line 8:24 token recognition error at: '/'
line 8:29 token recognition error at: '-'
line 8:31 token recognition error at: '-'
line 8:36 token recognition error at: '"]'
line 9:0 extraneous input '[' expecting {MOVE_NUMBER, NEWLINE}
line 9:7 token recognition error at: '"1'
line 9:9 token recognition error at: '"]'
line 11:5 no viable alternative at input '1...Kf'
line 11:35 token recognition error at: '='
(file (game (tag [ Event ining) \n [ Site \n [ Date . . \n [ Round \n [ White hite trainer \n [ Black lack trainer \n [ Result (notation move (move 2) (move 1) (move 2))) \n (game (tag [ FEN k 3 8 4 KP 2 8 8 8 8 8 b 0 1) \n [ Setup \n \n (notation move (move 1 ... Kf) move (move 8) (move 2 . f) move (move 7 Kg) move (move 7) (move 3 . Ke) move (move 7 Kg) move (move 6) (move 4 . f) move (move 8 Q) 1/2-1/2)))
Some links on the generated source code :
The eclipse project (zipped) I used in order to test.
Helps and advices will be well appreciated.
Where is you found this construction "[:print:]"?
you could use like this:
TAG_VALUE: '"' (~[\"])* '"';
And you of course must change it also for token MOVE_DESC.
And you have some problems in here, because your "tag" ends with "NEWLINE", but there are more than 1 "tag" before "notation" in pgn files:
game: (tag+ NEWLINE+)? notation;
tag: '['TAG_TYPE TAG_VALUE']';
And better to use ANTRLWorks2 for debugging your grammar files.