Search code examples
compiler-constructionjvmprogramming-languages

How can I create my own programming language targeting the JVM?


I would like to create my own programming language targeting the JVM. I am unsure how to do this. Must I create my own compiler? Do all programming languages have unique compilers, or are there existing ones that can be adapted?

I have found some information about targeting the .NET CLI.

I've also found the Dragon Book on compiler design.


Solution

  • Yes, every language has their own compiler. There are a few types of compiler that can be written, each one gets more complicated and builds on the previous:

    1. recogniser, only answers whether the input source valid syntax,
    2. parser, creates an inmemory representation of the input source (called an AST - abstract syntax tree),
    3. compiler (generates a translated form of the input),
    4. optimising compiler, as 3 but optimises the AST before generating the output.

    All of these compiler forms usually reuse tools that are specially designed to help with different stages of compilation. Which briefly are:

    Parsing: I would recommend parboiled for Java. Older tools used to be variants of lex and yacc, two unix tools for the lexical and grammer stages of parsing. ANTLR and Javacc are two examples that run on the JVM; however parboiled is just awesome.

    AST: I do not know of any tool here, one can reuse a model from another JVM language such as javac however I would personally create this myself.

    Output Generation: A quick approach is to generate Java source code, which has some limitations but is overall an excellent approach for testing the water. When/if you decide to move on to generating JVM byte codes, a collection of helper libraries can be found here. However there is a lot to learn about the JVM before attempting that route, the JVM spec/book by Oracle is a mandatory read.

    For general knowledge, the llvm tutorial is excellent, it is quite short and very well written. I know that you said that you wanted to target the JVM, however nearly everything that this tutorial covers will help you in understanding the parts required.

    I would recommend following the tutorial, and rewrite it using Java. Its steps are very logical. Essentially one would write a recogniser for a very simple language, such as '1+2' only. Then write an interpreter for that language. That would be a very reasonable stopping point, many languages are interpreted; Java started off its life like this too. Optionally one can then move on to emit a target output, say Java source code at first. The code for this would be fairly short, and will give you quicker feedback than trying to write any single layer in full first. There are many opportunities to consume your coding hours if you went down that road.