Search code examples
ocamlocaml-dunedune

Generating executable .ml test cases from a glob of plaintext files using dune


I'm in the process of writing a test suite for some lexing/parsing and it would be much cleaner if I could drop test input/output files in a directory and have dune generate OCaml test cases for each of these during a step in compilation.

I figured I could use dune for this, very much inspired by this documentation page (Preprocessors and PPXs), but I'm struggling at getting it to work. I've essentially come to 2 dead ends:

  1. An alias rule that would execute a script padding each of the test files seemingly wouldn't work:

    (tests
      (names lexer)
      (libraries llvmlexer llvmparser ounit2))
    
    (rule
      (alias runtest)
      (deps (glob_files %{workspace_root}/**/*.ll))
        (action (system "./preprocess-lexer.sh '%{input-file}'")))
    

As it errors with:

File "test/dune", line 9, characters 41-54:
9 |  (action (system "./preprocess-lexer.sh '%{input-file}'")))
                                             ^^^^^^^^^^^^^
Error: %{input-file} isn't allowed in this position.

I'm very confused by this. Is this a matter of executing the action once for all files? If so is it possible to execute it once for each dependency?

  1. Neither would having all source/targets specified, as that would entail listing them all in the dune file as wildcard rules is apparently still not a thing: https://github.com/ocaml/dune/issues/307

Solution

  • Indeed, dune doesn't support wildcard rules at the time of writing. It has, however, very limited support for it tailored for preprocessing so that you can specify a rule of the following form *.ml -> *.pp.ml, exactly with these suffixes, e.g.,

    (library
     (name foo)
     (preprocess (action (run cpp %{input-file}))))
    

    And then if you have a file bar.ml

    #define X 1
    
    let x = X
    

    It will be preprocessed to a bar.pp.ml file, which will be dropped in the build directory and used instead of bar.ml. This is how this mechanism works and it is designed to work only with the OCaml source files. And if it suits you, you just need to fix the suffixes, i.e., you need to rename your .ll files to .ml and specify the preprocess stanza that uses you preprocessor instead of cpp that I have used in the example.

    The mechanism described above is called "preprocessing via user actions", which should be confused with the more general (and also using actions) custom rule stanza. The common use of this stanza is to define the rules of the form,

    (rule
     (target foo.data)
     (deps foo.data.src)
     (action
      (with-stdin-from %{deps}
       (with-stdout-to %{target}
        (chdir %{workspace_root}
         (run ./tools/my_rewriter.sh))))))
    

    where ./tools/my_rewriter.sh will receive the contents of foo.data.src in stdin and everything it prints will be redirected to foo.data. (Note that ./tools/my_rewriter.sh is the path from the top-level of your project). You can't specify a wildcard, like

    (target *.data)
    (deps *.data.src)
    

    and expect it to be called for each file with the matching suffixes. Again, at the time of writing such a mechanism is not implemented in dune. You have, however two options as workarounds.

    Option 1. Autogenerating the Rules

    You can either rely on the OCaml Syntax and produce the dune file that contains such a rule replicated for each *.data.src in the folder. I wouldn't personally recommend this, as the status of the OCaml Syntax support is not clear and it might misbehave in general.

    Alternatively, you can add an extra stage to your build process, e.g., a ./configure script that will generate the dune file with all these rules.

    You can also write them manually, of course :)

    Option 2. Using Globs and Directory Dependencies

    You can use glob_files and then change your action so that it takes a set of files and produce a set of files, e.g., using GNU parallel,

    (rule 
      (deps (glob_files *.data.in)
      (action (run parallel cp {} {.} ::: %{deps})))
    

    And this rule for each <foo>.data.in will produce <foo>.data. (Of course, you can write your own for loop, instead of using parallel).

    The caveat with this approach is that since this rule doesn't specify targets, then all produced files will be eventually deleted by dune. And the problem is that unlike deps the targets stanza doesn't accept glob_files, which perfectly makes sense, as the targets are not expected to exist at the time of rule application.

    For the rescue, we have the new directory-targets. To enable it, you need the following in your dune-project (the lang shall be greater than or equal to 3.0):

    (lang dune 3.0)
    (using directory-targets 0.1)
    

    Now you can put the test input data files that you would like to preprocess in the same folder as your test driver. In this case, I use *.data.src as the input files and test_foo.ml

    (rule
     (deps (glob_files *.data.src))
     (target (dir data))
     (action
      (progn
        (run mkdir -p data)
        (run parallel cp {} data/{.} ::: %{deps}))))
    
    
    (test
     (name test_foo)
     (deps data))
    

    The (run parallel cp {} data/{.} ::: %{deps}) will call cp <file>.data.src data/<file>.data for each <file> matching *.data.src. You can substitute it with your command which takes the set of input files and populates it with the preprocessed files. This command could even be implemented in OCaml, just specify ./path/to/your/tool.exe as the command and dune will build it automatically from ./path/to/your/tool.ml.

    In this setup, whenever you change an input *.data.src file, or any other dependency of the test, dune test will rebuild the data folder and correctly rerun the tests.

    For the sake of completeness, here is the contents of my test_foo.ml file,

    open Printf
    
    let () =
      Sys.readdir "data" |> Array.iter @@ fun file ->
      if Filename.check_suffix file ".data"
      then printf "testing with %s\n%!" file
    

    And here's a sample directory structure,

    $ tree
    .
    |-- bar.ml
    |-- dune
    |-- dune-project
    `-- test
        |-- bar.data.src
        |-- dune
        |-- foo.data.src
        `-- test_foo.ml
    
    1 directory, 7 files
    

    Feel free to poke me if you want to get a fully working example.