Every time we do cons and destructuring and similar operations on lists, we create copies of the original lists. This can become a performance issue with very large collections.
It's my understanding that to improve this situation, some programming languages implement lists as data structures that can be copied much more efficiently. Is there something like this in SML? Perhaps in the language definition, or as a feature that is implementation dependent?
If there's no such data structure, are arrays and mutability one pattern that optimizes on large lists? As long as the state is local to the function, can the function still be considered pure?
SML is multi-paradigm, but idiomatic SML is also functional-first, so both "lists with efficient copying" and "mutable arrays" approaches should make sense, depending on what the core language offers.
Is there an immutable data structure that is more efficient than the normal singly linked list for very large collections? If not, is there a native purely functional data structure that can optimize this scenario? Or should mutability and arrays be used internally?
Is there something like this in SML? Perhaps in the language definition, or as a feature that is implementation dependent?
You have a couple options here, depending on what you're willing to rely on.
The "initial basis" provided by the definition doesn't provide anything like this (I suppose an implementation could optimise lists by giving them some special compiler treatment and implementing them as copy-on-write arrays or some such, but I'm not aware of an implementation which does).
The widely implemented (SML/NJ, MLton, MoscowML, SML#, Poly/ML) SML Basis Library provides both functional and mutable options. The ones which come to mind are 'a Vector.vector
and 'a Array.array
(immutable and mutable, resp.). SML/NJ has an extension for vector
literals, e.g. as #[1, 2, 3]
, and I believe MLton supports this on an opt-in basis.
There are some other libraries, such as CMLib, which provide other immutable data structures (e.g., sequences)
@molbdnilo's commented above about Okasaki's Purely Functional Data Structures. I've only read his thesis, and not the book version (which I believe has additional material, although I don't know to what extent; seems that this has come up on on software engineering stack exchange). I'm not aware of any pre-packaged version of the data structures he presents there.
As long as the state is local to the function, can the function still be considered pure?
This obviously depends somewhat on your definition of what it means for a function to be pure. I've heard "observationally pure" for functions which make use of private state, which seems to be widespread enough that it's used by at least some relevant papers (e.g., https://doi.org/10.1016/j.tcs.2007.02.004)
Is there an immutable data structure that is more efficient than the normal singly linked list for very large collections? Or should mutability and arrays be used internally?
I think the above mentioned vectors are what you'd want (I imagine it depends on how large "very large" is), but obviously better options may exist, use-dependent. For instance, if insertion and deletion are more important than random access then there are likely better options.
Mutability might also make more sense depending on your workload, e.g., if random access is important, you're doing many updates, and desire good memory locality.