Search code examples
parsingrdfw3cebnf

How to consume W3C EBNF-Notation and produce a parser generator?


Throughout the RDF specs an EBNF-NOTATION XML specification is used to specify the grammar of a document. So I am wondering how to use Antlr/bison/yacc (maybe with some flag within these tools I don't know how to search for) — or other tools I don’t know about yet — to consume these specifications and generate a parser for me to use to see if my RDF is well-formed before trying to load.

An example grammar for my specific use case is: https://www.w3.org/TR/n-quads/#sec-grammar

I have already converted this grammar into Antlr4 grammar and created a parser using that tool and attempted to just write my own recursive descent parser but it was time-consuming and I'd rather not repeat the exercise if I have to do this again.

Don't really have any code, this is just a request for information.

What I want to do is basically copy/paste the grammars specified in this XML EBNF-NOTATION and produce a parser generator similar to what Antlr provides.


Solution

  • REx Parser Generator works from grammars in W3C-style EBNF, and Railroad Diagram Generator can extract grammars directly from W3C documents.

    Here is how to create a working parser from the example grammar (in Java - some other target languages are supported, too):

    • browse to Railroad Diagram Generator
    • on the Get Grammar tab, enter the example URL https://www.w3.org/TR/n-quads
    • proceed to Edit Grammar
    • add a whitespace rule to the end of the grammar: WHITESPACE ::= [ #x9]+ /* ws: definition */
    • save grammar to local file n-quads.ebnf
    • browse to REx Parser Generator
    • use input file n-quads.ebnf and command line -java -tree -main
    • save the resulting parser n_quads.java and compile it
    • run the parser on a sample file: java n_quads -i a-sample-file

    Full disclosure: I’m the creator and maintainer of REx Parser Generator.