I'm writing my own compiler and I'm struggling to implement a module system. Can someone guide me, how should this be done? how other languages tackle this? Also I'm trying to avoid what c and c++ do (header files). I do like the module system in Go/Golang though.
I don't know if this is relevant, but I'm using LLVM (maybe there's a magic way to import symbols).
my initial approach:
this leads to a tree structure:
... etc.
then I would traverse each node and copy it's symbols (functions, global variables, etc.) to the parent node's symbol table. if a parent node is null, it's an entry point file and the compiler can start output object files.
why do I think that this is bad?
Thanks in advance
I think your approach is really good. Compile time speed is not that important, usability is. To prevent name collisions you can use some kind of module-namespace (importname.foo() instead of just foo()) and whenever foo does not exist allow both methods. Alternatively you could insert a placeholder in the parents symbol table and whenever the user uses that name you throw a compile time error (something like ambiguous symbol). that would look like this: main.mylang
import module1
import module2
int main() {}
module1.mylang
import module2
void foo() {}
void bar() {}
module2.mylang
import module1
void bar() {}
void fun() {}
After finding loops, the tree would look like this:
main
├──module1
│ └──module2
└──module2
└──module1
And a graph like this:
main
├─>main()
├─>foo() (module1)
├─>bar() (defined twice, throw error when used)
├─>fun() (module2)
├─>module1<───────────┐
│ ├─>foo() (module1) │
│ └─>bar() (module1) │
└─>import2<───────────┘
├─>bar() (module2)
└─>fun() (module2)
I don't know much about llvm, but I am pretty sure normal tables are not enough to archive this. You will at least need nested tables if not even a graph like structure like I described. Also this is not possible with classical C/C++ architecture, except if you use unique identifiers as symbols and don't let the user know (like c++ function overloading). For example you could call one function __import1_bar
and the other __import2_bar
and whenever the user uses bar()
you look up in this graph which one he wants to call. In the main function using import1.bar()
will lead you to __import1_bar
(follow the graph) and import2.bar()
or import1.import2.bar()
will lead you to __import2_bar
.
Good Luck figuring that out. But it is certainly a interesting problem.