Breakpoints in the argument-passing scheme of OCaml

Today, I was going through the source code of Jane Street's Core_kernel module and I came across the compose function:

(* The typical use case for these functions is to pass in functional arguments
   and get functions as a result. For this reason, we tell the compiler where
   to insert breakpoints in the argument-passing scheme. *)
let compose f g = (); fun x -> f (g x)

I would have defined the compose function as:

let compose f g x = f (g x)

The reason they give for defining compose the way they did is “because compose is a function which takes functions f and g as arguments and returns the function fun x -> f (g x) as a result, they defined compose the way they did to tell the compiler to insert a breakpoint after f and g but before x in the argument-passing scheme.”

So I have two questions:

Why do we need breakpoints in the argument-passing scheme?
What difference would it make if we defined compose the normal way?

Coming from Haskell, this convention doesn't make any sense to me.

Solution

This is an efficiency hack to avoid the cost of a partial application in the expected use case indicated in the comment.

OCaml compiles curried functions into fixed-arity constructs, using a closure to partially apply them where necessary. This means that calls of that arity are efficient - there's no closure construction, just a function call.

There will be a closure construction within compose for fun x -> f (g x), but this will be more efficient than the partial application. Closures generated by partial application go through a wrapper caml_curryN which exists to ensure that effects occur at the correct time (if that closure is itself partially applied).

The fixed arity that the compiler chooses is based on a simple syntactic analysis - essentially, how many arguments are taken in a row without anything in between. The Jane St. programmers have used this to select the arity that they desire by injecting () "in between" arguments.

In short, let compose f g x = f (g x) is a less desirable definition because it would result in the common two-argument case of compose f g being a more expensive partial application.

Semantically, of course, there is no difference at all.