Search code examples
listfunctional-programmingsmlsmlnj

Traversing a list until certain criterion is met


I would like to create a simple SML program that traverses a list from left to right.Let's say I have a list of N items of K different types.For example the list 1 3 1 3 1 3 3 2 2 1 has 10 numbers of 3(1,2,3) types.

What I would like to to is go through this list from left to right and stop when i have found all K different numbers.In this case I would stop right after stumbling upon the first 2.

This could be done by spliting the list in head and tail in each step and processing the head element.However how could I keep track of the different numbers I have found?

This could be done in C/C++ by simply holding a counter and a boolean array with K elements. If i stumble upon an element i with bool[i]=false i make it true and counter=counter+1.

It is stated though that arrays are not the best option for SML so i was wondering if i have to use another data structure or if i have to create a new function to check each time if i have seen this element before(this would cost in time complexity).


Solution

  • how could I keep track of the different numbers I have found?

    [...] in C/C++ by [...] a boolean array with K elements

    Abstractly I would call the data structure you want a bit set.

    I'll give you two answers, one using a sparse container and one using a bit set.

    Sparse

    I'd use a list to keep track of the elements you've already seen:

    fun curry f x y = f (x, y)
    
    val empty = []
    fun add x set = curry op:: x set
    fun elem x set = List.exists (curry op= x) set
    
    fun seen k xs =
        let fun seen_ 0 _ _  = true
              | seen_ _ [] _ = false
              | seen_ k (x::xs) set =
                  if elem x set
                  then seen_ k xs set
                  else seen_ (k-1) xs (add x set)
        in seen_ k xs empty end
    

    You could also use a balanced binary tree as set type; this would reduce lookup to O(lg n). The advantage of using an actual container (list or tree) rather than a bit array is that of sparse arrays/matrices. This works for ''a lists.

    Bit set

    [...] boolean array with K elements [...]

    If i stumble upon an element i [...]

    Until this point, you haven't said that elements are always unsigned integers from 0 to K-1, which would be a requirement if they should be representable by a unique index in an array of length K.

    SML has a module/type called Word / word for unsigned integers (words). Adding this constraint, the input list should have type word list rather than ''a list.

    When you make an array of primitive types in many imperative, compiled languages, you get mutable, unboxed arrays. SML's Array type is also mutable, but each bool in such an array would be boxed.

    An easy way to get an immutable, unboxed array of bits would be to use bitwise operations on an IntInf (SML/NJ; implementations vary); it would automatically grow as a bit is flipped. This could look like:

    fun bit x = IntInf.<< (1, x)
    
    val empty = IntInf.fromInt 0
    fun add x set = IntInf.orb (set, bit x)
    fun elem x set = IntInf.> (IntInf.andb (set, bit x), 0)
    

    The function seen would be the same.

    The fact that k is decreased recursively and that set grows dynamically means that you're not restricted to elements in the range [0,K-1], which would have been the case with an array of size K.

    Example use:

    - seen 5 [0w4, 0w2, 0w1, 0w9];
    val it = false : bool
    
    - seen 5 [0w1, 0w2, 0w3, 0w4, 0w8];
    val it = true : bool
    

    This solution uses a lot of memory if the elements are large:

    - seen 1 [0w100000000];
    *eats my memory slowly*
    val it = true : bool
    

    Additional things you could do:

    • Create a module, structure BitSet = struct ... end that encapsulates an abstract type with the operations empty, add and elem, hiding the particular implementation (whether it's an IntInf.int, or a bool Array.array or an ''a list).
    • Create a function, fun fold_until f e xs = ... that extracts the recursion scheme of seen_ so that you avoid manual recursion; a regular foldl is not enough since it continues until the list is empty. You could build this using error-aware return type or using exceptions.
    • Consider Bloom filters.