Search code examples
compilationprofilingjuliaread-eval-print-loop

If a Julia script is run from the command line, does it need to be re-compiled every time?


I've read through quite some documentation and questions but I'm still confused about this.

In the Profiling section of the documentation it's suggested to first run the target function in the REPL once, so that it's already compiled before being profiled. However, what if the script is fairly complicated and is inteded to be run in the command line, taking arguments? When the julia process finishes and I run the script the second time, is the compilation performed again? Posts like https://stackoverflow.com/a/42040763/1460448, Julia compiles the script every time? give conflicting answers. They also seem to be old while Julia is constantly evolving.

It seems to me that the second run takes exactly as much time as the first run in my experience. The startup time is quite long. How should I optimize such a program? Adding __precompile__() doesn't seem to have changed the execution time at all.

Also, what should I do when I want to profile such a program? All resources on profiling talk about doing so in the REPL.


Solution

  • Please correct me if I am wrong, but it sounds like you have written some long script, say, myfile.jl, and then from your OS command line you are calling julia myfile.jl args.... Is this correct? Also, it sounds like myfile.jl does not define much in the way of functions, but is instead just a sequence of commands. Is this correct? If so, then as has been suggested in the comments on the question, this is not the typical work-flow for julia, for two reasons:

    1) Calling julia from the command line, ie julia myfile.jl args... is equivalent to opening a REPL, running an include command on myfile.jl, and then closing the REPL. The initial call to include will compile any methods that are needed for the operations in myfile.jl, which takes time. But since you're running from the command line, once the include is finished, the REPL automatically closes, and all that compiled code is thrown away. This is what DNF means when he says the recommended workflow is to work within a single REPL session, and don't close it until you are done for the day, or unless you deliberately want to recompile all the methods you are using.

    2) Even if you are working within a single REPL session, it is extremely important to wrap pretty much everything you do in functions (this is a very different workflow to languages like Matlab). If you do this, Julia will compile methods for each function that are specialized on the types of the input arguments that you are using. This is essentially why Julia is fast. Once a method is compiled once, it remains available for the entire REPL session, but is disposed of when you close the REPL. Critically, if you do not wrap your operations in functions, then this specialized compilation does not occur, and so you can expect very slow code. In julia, we call this "working in the global scope". Note that this feature of Julia encourages a coding style consisting of breaking your tasks down into lots of small specialized functions rather than one behemoth consisting of 1000 lines of code. This is a good idea for many reasons. (in my own codebase, many functions are a single-liners, most are 5 lines or less)

    The two points above are absolutely critical to understand if you are working in Julia. However, once you are comfortable with them, I would recommend that you actually put all your functions inside modules, and then call your module(s) from an active REPL session whenever you need it. This has the additional advantage that you can just add a __precompile__() statement at the top of your module, and then julia will precompile some (but not necessarily all) of the code in that module. Once you do this, the precompiled code in your module doesn't disappear when you close the REPL, since it is stored on the hard-drive in a .ji file. So you can start a new REPL session, type using MyModule, and your precompiled code is immediately available. It will only need to re-compile if you alter the contents of the module (and this all happens automatically).