module compiler-construction language-design

implementing a module system in a programming language

I'm writing my own compiler and I'm struggling to implement a module system. Can someone guide me, how should this be done? how other languages tackle this? Also I'm trying to avoid what c and c++ do (header files). I do like the module system in Go/Golang though.

I don't know if this is relevant, but I'm using LLVM (maybe there's a magic way to import symbols).

my initial approach:

read and parse the entry point source file ie. main.mylang.
go through the imports of main.mylang
for each import: read, parse and resolve it's imports ...

this leads to a tree structure:

main.mylang: import1.mylang, import2.mylang, import3.mylang
import1.mylang: import4.mylang, import5.mylang
import2.mylang: import6.mylang

... etc.

then I would traverse each node and copy it's symbols (functions, global variables, etc.) to the parent node's symbol table. if a parent node is null, it's an entry point file and the compiler can start output object files.

why do I think that this is bad?

it's very slow, even when compiling 3-5 source files
it's easy to cause name collisions
you have to import the entire symbol table, because the imported file's exported symbols depend on the internal ones. for example: imagine an exported function that modifies an internal global variable

Thanks in advance

Solution

I think your approach is really good. Compile time speed is not that important, usability is. To prevent name collisions you can use some kind of module-namespace (importname.foo() instead of just foo()) and whenever foo does not exist allow both methods. Alternatively you could insert a placeholder in the parents symbol table and whenever the user uses that name you throw a compile time error (something like ambiguous symbol). that would look like this: main.mylang

import module1
import module2
int main() {}

module1.mylang

import module2
void foo() {}
void bar() {}

module2.mylang

import module1
void bar() {}
void fun() {}

After finding loops, the tree would look like this:

main
├──module1
│  └──module2
└──module2
   └──module1

And a graph like this:

main
├─>main()
├─>foo() (module1)
├─>bar() (defined twice, throw error when used)
├─>fun() (module2)
├─>module1<───────────┐
│  ├─>foo() (module1) │
│  └─>bar() (module1) │
└─>import2<───────────┘
   ├─>bar() (module2)
   └─>fun() (module2)

I don't know much about llvm, but I am pretty sure normal tables are not enough to archive this. You will at least need nested tables if not even a graph like structure like I described. Also this is not possible with classical C/C++ architecture, except if you use unique identifiers as symbols and don't let the user know (like c++ function overloading). For example you could call one function __import1_bar and the other __import2_bar and whenever the user uses bar() you look up in this graph which one he wants to call. In the main function using import1.bar() will lead you to __import1_bar (follow the graph) and import2.bar() or import1.import2.bar() will lead you to __import2_bar.

Good Luck figuring that out. But it is certainly a interesting problem.