Search code examples
ocamlobject-identityreferenceequals

When are physically distinct values created in OCaml?


I'm trying to understand what the physical equality operators (Pervasives.(==) and Pervasives.(!=)) mean in OCaml.

The language manual says that the expression "" is a "constant", not an "expression":

6.5 Constants

constant ::== ... string-literal

but I can't find any verbiage indicating that constants are singly/pre-evaluated or pooled, and the REPL indicates that mutable string values are (thankfully) not pooled.

(* a *)  ""          == "";;             (* false *)
(* b *)  "foo"       == "foo";;          (* false *)
(* c *)  ""          == String.copy "";; (* false *)
(* d *)  ()          == ();;             (* true  *)
(* e *)  (42, -42)   == (42, -42);;      (* false *)
(* f *)  ("", 1)     == ("", 1);;        (* false *)
(* g *)  None        == None;;           (* true  *)
(* h *)  (Some None) == (Some None);;    (* false *)

Section "19.3 Representation of OCaml data types" suggests that the language specification requires that bools, ints, chars, the unit value, simple variants, and empty lists are selfless.

Does an implementation have to behave as above to be a complying OCaml implementation?

Can a complying OCaml implementation rewrite the pointer at b to point to a when a = b (* structurally *) is true and both are values of an immutable type (or effectively immutable values like zero length strings/arrays) as is sometimes done to reduce the number of reachable younger values in a generational GC?


Solution

  • As I read the language spec, there are very few guarantees about when values are distinct. I believe the only guarantee is in the documentation of the Pervasives module:

    On mutable types such as references, arrays, strings, records with mutable fields and objects with mutable instance variables, e1 == e2 is true if and only if physical modification of e1 also affects e2. On non-mutable types, the behavior of ( == ) is implementation-dependent; however, it is guaranteed that e1 == e2 implies compare e1 e2 = 0.

    One of the cool things about FP is that the compiler and runtime are free to do arbitrarily clever things with immutable values. So this guarantee is about all you would really want to have (IMHO).

    In sum, yes, the runtime or the compiler is free (again, IMHO) to share (and not share) immutable values any way it thinks will be helpful.

    I wouldn't interpret the representation section as part of the language specification. It's just useful documentation of the current implementation.