Search code examples
xmlantlrabstract-syntax-treecode-translation

AST to XML in generally (maybe ANTLR)


I need to parse files written in some languages(Java, C, C#...) and then trace the AST(Abstract syntax tree) to xml. (Actually the aim is to manipulate it and trace to another language - this second part have been implemented). After investigation I find out that there is no common approach to do this.

The most closest one is srcML. But first problem is that it is not Java =). The second problem is amount of languages (only 3).

I know that DMS can solve this problem, but it is not free and open-source.

So, as I understand, there is single way to do this: take ANTLR and try to convert AST to XML. So question is how to do it with ANTLR(Java), or maybe I miss some(not ANTLR way) to do this.


Solution

  • There are more Java tools besides ANTLR that can do this (JavaCC is a popular alternative, to name just one).

    Using a parser generator to solve this problem, you'd need to do the following:

    1. define a grammar which the parser can interpret and generate a lexer and parser (in your case, you need 3 grammars for your 3 languages);
    2. iterate over the AST your parser created, and output plain text (XML, in your case);

    Grammars for Java, C# and C are available on ANTLR's Wiki, I'm sure readily available grammars exist for JavaCC (and other parser generator tools: Google is your friend here). But be aware that it is a Wiki, and many grammars are in an experimental state, or contain errors.

    You could just skip step #1 and find existing parser that construct the AST for you. You only need to walk the AST yourself and create an XML from it. Here's a Java 5 parser, for example (for the other ones, again, Google is your friend).

    Good luck.