I have been wanting to create my own programming language and I am looking to start writing a basic compiler. I am doing this purely for learning purposes.
I will be writing the compiler in C#.
I have been trying to decide whether or not to generate IL or another high-level language. From articles and tutorials I have seen/read it seems C and MSIL (by way of reflection.emit) are the most popular.
I am wondering which approach will make my programming language faster? (assuming they were implemented optimally). Ideally I'd like the language to be able to run on both MS and Linux/OSX - I also understand that there may be better alternatives out there I am not considering
Your decision generaly depends on the design and paradigms of your language. If your language will be small and will not include complex object-oriented features, than only non "object-oriented" features of IL will be used, and the difference is about:
- The availability of.NET virtual machine and BCL vs C standard library for purpose of language implementation. This includes the memory management capabilities and implementation of primitive types, such ints and strings.
- The code generation: stack-based IL vs high-level C syntax. Of course, it can be easier to generate high-level constructs of another language (you should not embrace all the grammar of C, you can just use what you need), but for learning puproses it is more useful to learn how to generate low-level instructions like IL opcodes. And don't forget: it will be cool, if you split your tool into frontend and backend, as it is done in every solid compiler. Than you can just use different backends for code generation.
PROS for IL:
- more solid learning process and the complete result: your compiler will not require any other tools and will be self-sufficing;
- the presense of BCL and resource-management layers in CLR;
- ability to bootstrap your compiler by interaction with C# code.
- the unique experience with .net platform - the useful thing if you plan to raise your C# and .net skills.
PROS for C:
- the ability to utilize existing backends to generate platform code
and to perform optimizations; you can compile your C output for every
platform C compiler can;
- absense of the dependency from CLR: you will not need .net fw or Mono to run the produced output. Today Mono is mature thing and is running both on Mac and Linux, but it is always the choice: IL or platform code.
A lot of modern languages compile to another high-level languages (oh god, there is tons of something-to-js tools today!), and some of the languages is even DESIGNED to be compiled to another high-level language (CoffeeScript to JavaScript), but don't forget that you have another options too, for example, LLVM intermediate representation.