Search code examples
listsplitocamlfold

OCaml split function


I'm currently studying for a CS exam and I'm having an hard time understanding an exercise from my book. The exercise is as follows:

Define, using FOLDR and without using explicit recursion, a function (split : ’a list -> ’a -> ’a list * ’a list) such that split l n returns a pair of lists. The first contains all the values preceding the first occurrence of n in l (in the same order), and the second contains all the remaining elements (in the same order). If n does not appear in l, there are no values preceding the first occurrence of n.

Examples: split [3;-5;1;0;1;-8;0;3] 0 = ([3;-5;1],[0;1;-8;0;3]), split [3;4;5] 7 = ([],[3;4;5])

This is the code written by my professor to solve the exercise:

let split l n =
  let f x (l1, l2, b) =
    if x = n then ([], x::(l1@l2), true)
    else if b then (x::l1, l2, b)
    else (l1, x::l2, b)
  in let (l1, l2, b) = foldr f ([], [], false) l in (l1, l2) ;;

I don’t understand that second line at all (let f x (l1, l2, b)).

How do those parameters get filled with a value, so that all the logic that comes with it makes sense? For example: what is x and how can it be compared to n if it has no value? What is the meaning of those Boolean values in b?

In addition I don't understand that foldr function in the last line and I don't find any documentation about it. In fact, even my compiler doesn’t understand what foldr is and gives me an error (*Unbound value foldr*). Initially I thought it was some kind of abbreviation for List.fold_right but if I try to replace with the latter I still get an error because the following parameters are not correct (

File "split.ml", line 6, characters 41-56:
Error: This expression has type 'a * 'b * 'c
       but an expression was expected of type 'd list
).

Thank you in advance for any help or advice.


Solution

  • I don't know whether this is allowed by OCAML's syntax rules or not, but let's add some extra white space to make it clearer:

    let split l n =
      let f x                      (   l1,        l2 , b    ) =
        if       x = n  then       (   [], x::(l1@l2), true )
        else if  b      then       (x::l1,        l2 , b    )  (* b is true  *)
        else                       (   l1, x::    l2 , b    )  (* b is false *)
      in let (l1, l2, b) = foldr f (   [],        [] , false)  l 
         in  (l1, l2) ;; 
    

    foldr is, in pseudocode,

    foldr f z [x1  ;   x2  ;  ...   ;   xn1  ;   xn             ]
     =
               x1 -f- (x2 -f- (... -f- (xn1 -f- (xn -f- z))...))
    

    where a -f- b denotes simple application f a b, just written infix for convenience. In other words,

    foldr f z [x1 ;            x2 ; ... ; xn1 ; xn]     (*  x_{n-1}  *)
     =
          f    x1  (foldr f z [x2 ; ... ; xn1 ; xn])    (*  x_{n-1}  *)
    

    whereas

    foldr f z []  =  z
    

    Thus the above is actually equivalent to

     =  let    t1  = f xn  z
        in let t2  = f xn1 t1          (*  x_{n-1}  *)
        in ....
        in let tn1 = f x2  tn2         (*  t_{n-2}  *)
        in let tn  = f x1  tn1         (*  t_{n-1}  *)
        in     tn
    

    You should now be able to see that what this does is to work on the list's elements from right to left, passing interim results to the subsequent applications of f on the left.

    You should also be able now to write that missing definition of foldr yourself.


    So if we substitute your specific definition of f, how it works for a list of e.g. three elements [x1; x2; x3] where x2 = n, is equivalent to

         let    (l1,   l2,   b  ) = (    [],             []    , false)
         in let (l1_3, l2_3, b_3) = (    l1,         x3::l2    , b    )
         in let (l1_2, l2_2, b_2) = (    [],   x2::(l1_3@l2_3) , true )
         in let (l1_1, l2_1, b_1) = (x1::l1_2,           l2_2  , b_2  )
         in     (l1_1, l2_1)
    

    i.e.

         let    (l1,   l2,   b  ) = (    [],               []  , false)
         in let (l1_3, l2_3, b_3) = (    [],           x3::[]  , false)
         in let (l1_2, l2_2, b_2) = (    [],  x2::([]@[x3])    , true )
         in let (l1_1, l2_1, b_1) = (x1::[], [x2 ;     x3]     , true )
         in     ([x1], [x2; x3])
    

    Thus the resulting lists are being built from the back.

    The Boolean flags being passed along allows the function to correctly handle cases where there are more than one elements equal to n in the list. The same effect can be achieved without any flags, with a small post-processing step instead, as

    let split l n =
      let f x           (   l1,        l2 ) =
         if x = n then  (   [], x::(l1@l2))
                  else  (x::l1,        l2 )
      in match  foldr f (   [],        [] ) l  with
         | (l1, []) -> ([], l1) 
         | (l1, l2) -> (l1, l2) ;; 
    

    (if that's not a valid OCaml code, take it as a pseudocode then).