build compiler-construction programming-languages interpreter

Starting point and help creating a programming language

I'm currently a university student, waiting to choose a great Bachelor thesis. I've been willing to create a language for a long time and since I think I am able to, I would like to hear the opinions on the following matter:

I know lots of languages, including C, C++, Python, Erlang, PHP, Javascript, etc.

I can pretty much choose the one I want to create a language as my base. The point is: I've seen a lot of people doing it with python, wich is great but I am best skilled in PHP. Not plain PHP of course, I'm a big laravel fan.

Apparently, a community driven project called laravel zero (http://laravel-zero.com/) allows creating great console applications in PHP, wich made me wonder... What if I use this as my base?

Few keys: I don't mind about speed, I don't mind about optimization.

I am sorry for C / C++ fans but I won't choose that as a starting point.

If you're into programming languages, I may ask another question:

Is it better to create a compiled or interpreted language? Why?

As far as I know, creating an interpreted language, will always require that "mother" language to be present somehow, since you can't self-host your interpreter unless it's in binary code.

Anybody got anything of interest to share with me? I would love to hear opinions and stuff about it.

For example, where's the best starting point, what should I look before entering into this subjects, etc ANYTHING would be of great help.

Thanks

Solution

For the most part the programming language does not matter. If you want to use lexer+parser generators, you'll want to use a language for which those are available. That's the case for most languages that aren't totally obscure or domain-specific (including PHP according to a quick search), but there are certainly significant differences in quality between different generators, so you might want to take a closer look at the quality of the available tools before picking a language. Of course that's only a consideration if you do want to use lexer and/or parser generators. If you're going to write your lexer and parser yourself, any language will do.

If you decide to write a compiler and you want to use LLVM as a back end, it'd be a plus if there are bindings for LLVM for your language. That does not seem to be the case for PHP (a search only brought up this extension, which is used to call functions in LLVM-bitcode, not about generate LLVM-bitcode). On the other hand, you can always generate LLVM-assembly as text and then invoke the LLVM command line tools. And if you're writing a compiler without LLVM or an interpreter, this does not matter anyway.

It helps if your language has a map data structure to define the symbol table, but most languages have that.

I personally like functional languages for language implementations as immutable maps are a good way to represent symbol tables and algebraic data types are a good way to represent ASTs, but none of that is strictly necessary.

Almost any language that you're comfortable with can be used to implement languages without too much trouble.

Is it better to create a compiled or interpreted language?

That depends entirely on your requirements and the properties of your language. Note that "compiled" or "interpreted" aren't really properties of the language, but of the language's currently available implementations. There's the language and then there is its implementation (or implementations).

The more "dynamic" features your language has (such as defining new functions or variable at run time for example), the harder it is to write a compiler, but even without those, writing an interpreter tends to be easier. So it can certainly make sense to start with an interpreter, even if you plan to eventually go with a compiler (or a JIT compiler).

Most of the front-end and mid-end phases can remain untouched when switching from an interpreter to a compiler anyway. So that's not as much of a waste of existing work as you might think.

As far as I know, creating an interpreted language, will always require that "mother" language to be present somehow, since you can't self-host your interpreter unless it's in binary code.

Right, if you write an interpreter and your host language also only has interpreters, you'll need your interpreter as well as an interpreter for the host language (to run your interpreter) in order to run programs written in your language. Of course, you can always rewrite your interpreter in a language for which compilers exist, which isn't any more work than self-hosting (which is a complete rewrite anyway unless your source language is supposed to be so close to your host language that you can write your interpreter in the intersection of the two languages).

Until you create a self-hosting compiler, the same would be true for your compiler though: as long as your compiler is written in PHP, you'll need PHP to compile your language (though not to run the compiled programs).

For example, where's the best starting point, what should I look before entering into this subjects, etc ANYTHING would be of great help.

The tag wiki for the compiler construction tag has a list of resources about compiler construction. Much of that information is also relevant when building interpreters.