Search code examples
calgorithmgarbage-collectioncompiler-construction

Algorithm to have garbage collection when compiling to c


I need some sort of algorithm to add garbage collection to my language (which is being compiled to c) and add a free statement or some other way so that it will have no memory leaks.

Yes, I looked at Garbage collection when compiling to C, but I do not understand the answer, and was hoping to get a more detailed answer on how to do it.

EDIT: for example if the code is

int *i = malloc(4);

in my language, this should be compiled to

int *i = malloc(4);

and then

free(i);

once i is no longer in use or out of the stack frame Thanks


Solution

  • You first need to read the garbage collection handbook.

    You later need to document, in written English, the conventions and invariants of your garbage collector. Is it a generational GC? Is it multi-thread friendly? Is it precise or conservative?

    The book by C.Queinnec Lisp In Small Pieces is helpful. It describes how to code various Lisp interpreters, and some Lisp to C compiler. Some chapters are related to garbage collection and their relation to generated C code.

    The Dragon book (about compilation) has a chapter on GC.

    A.Appel's book Compiling with Continuations is also helpful.

    You then could document and probably define macros implementing your GC conventions.

    Notice that malloc could be considered as a slow way of allocating garbage collected data. Read for example Appel's old paper Garbage Collection can be faster than Stack allocation (it was debated later, but it does give a good intuition). You could consider fetching large memory zones with mmap(2) and allocating inside them in some faster way. Then you won't free individual garbage values (if you adopt a copying GC strategy, using Cheney's algorithm), but will munmap(2) a large memory zone at once. Study also the C source code of malloc implementation inside GNU libc or musl libc.

    See my Bismon project as an example of C code (open source, for Linux) with GC.

    Look also inside the C code of Ocaml interpreter and compiler.

    Or inside the C runtime of SBCL or of Chicken/Scheme.

    Or inside the code of some open source JVM.

    The Bigloo project is a Lisp to C compiler.

    The GNU emacs editor contains a garbage collector. The GCC compiler also contains one.

    Circular references are difficult to handle with reference counting schemes.

    Consider also using Boehm conservative garbage collector open source library.

    Your GC will be operating system specific, and probably target processor specific.

    The RefPerSys project (in C++, with generation of C++ code at runtime) has a GC.

    At last, the valgrind utility (a tool to detect memory leaks) is open source and can be considered as containing some GC.

    Read also recent papers submitted to ACM SIGPLAN conferences. Several of them are related to garbage collection. Consider later submitting your own paper on GC.

    Budget several years of full time work.

    PS. As an introduction, read the old paper by P.Wilson Uniprocessor Garbage Collection Techniques