Search code examples
listcompiler-errorsocamlcode-generationocaml-dune

Stack overflow during compilation on large list literal in OCaml


I have a small project for checking if a number is prime. The idea is to prepare (generate) a list of primes up to a certain point for the library function to use (use a code generator for that).

Whilst the code generation works fine, compilation of generated file fails if the list is large (for example for list of primes up to a million, that is ~75k entries).

Output of dune build:

enter image description here

Part (at the bottom) of output of dune build --verbose:enter image description here

Some code:

(* prime_list.mli *)
val long_list : int list
(* THIS FAILS TO COMPILE: prime_list.ml (was generated, found only in _build directory) *)
let long_list = [2;3;5;7;
<a lot more entries, depending on parameter provided in dune file...>]

As I understand, the list literal, (which is of course sugar for 2 :: (3 :: ( 5 :: ... ))) has too deeply nested calls to :: and the stack overflows, during compilation (probably parsing rather than constructing the value? I don't know how the compilation works in OCaml at all sadly).

Do you have an idea for a way around that? I would like to have a list as long as I please, wrote every single function using tail recursion and the compiler is letting me down like this...

To recap: I have the list in memory (in generator) but am not able to dump it to a file (preferably .ml) in a way that allows me to bake it in the main program, which is frustrating.

Out of desperation I also tried generating a file like so:

(* prime_list.ml, another try *)
let long_list = []
let long_list = 999983 :: long_list
let long_list = 999979 :: long_list
...
let long_list = 3 :: long_list
let long_list = 2 :: long_list

as well as

let long_list0 = []
let long_list1 = 999983 :: long_list0
let long_list2 = 999979 :: long_list1
...
let long_list78487 = 3 :: long_list78496
let long_list78498 = 2 :: long_list78497
let long_list = long_list78498

but both these approcheas resulted in the same thing (Fatal error: exception Stack overflow). I am new to OCaml and I don't understand why exactly didn't this work. I am interested in an answer -- may be a bit of speculation -- to this question. Besides a solution to my problem of course.

Maybe I could change the limit of the stack somehow? Answers coming from this angle (of compiler settings and flags) will be much appreciated, but this wouldn't really be satisfactory as the final answer, as it probably doesn't scale indefinitely (I may be wrong here though, who knows?).

Here are lib/dune file contents for completeness:

; primes_with_generator/lib/dune
(library
 (name primes_with_generator)
 (libraries commons))

(rule
 (target prime_list.ml)
 (deps    (:gen ../generator/gen.exe))
 (action  (with-stdout-to %{target} (run %{gen} 1000000))))

1_000_000 is a parameter here -- primes up to 1_000_000 are generated. For primes generated up to 100_000 everything is fine, but I would like primes up to sqrt(INT MAX) ≈ 2^32 ≈ 10^9... well, at least wouldn't like to stop at ~200k because of a dumb reason like this.

Commons is probably irrelevant, but here is its interface for you to see:

(* commons/prime.mli *)

(* given arguments:
 - list of primes up to a point, should be in ascending order, may be empty),
 - a number n >= 2
   returns answer to the question: is number n prime?
*)
val is_prime : int list -> int -> bool

Solution

  • The quick and dirty solution to your issue is to increase stack size (with ulimit -s unlimited on linux or by switching to OCaml 5).

    The scalable solution is to not mix code and data. Your table of prime number is data and should be stored as such. As a quick storage format, you could use the Marshal module. If for some reason, you really don't want to have a separate data file, you can store your data as string:

    let primes: int array =
      Marshal.from_string
        "\132\149\166\190\000\000\000\006\000\000\000\001\000\000\000\006\000\000\000\006\208BCEGK"
        0
    

    which should work until the point where the array will be too big to keep fully in memory.

    EDIT: In order to load data from a file, you can use In_channel.with_open_bin which takes care of opening and closing the channel (even if an exception is raised) by itself

    let load filename: int array =
      In_channel.with_open_bin filename @@ fun chan ->
      Marshal.from_channel chan