Search code examples
listrandomfunctional-programmingocaml

Just want choose some random elements of a list and return them in OCaml


I want to choose n distinct random elements in a list l and return them in choose_elements but I have a StackOverFlow error for sufficiently large lists!

I tried to use a tail_recursive function choose_elem_aux in order to do that but I think my condition List.mem is not efficient enough in term of complexity! I do this normally in other programming languages with a boolean mark array which I mark the index of each random number that is generated in true! But I can't do this in OCaml because I can't do multiple instruction into an if or else block! like this:

... else {
mark[r] =true ;
choose_elem_aux l n mark tmp ;
} ...


let choose l = 
  nth l (Random.int (List.length l)) ;;
  
let rec choose_elem_aux l n tmp =
  if n=0 then tmp
  else
    let r=choose l in 
    if List.mem r tmp then
      choose_elem_aux l n tmp 
    else 
      choose_elem_aux l (n-1) (r::tmp) ;;

let rec choose_elements l n = 
  choose_elem_aux l n [] ;;

StackOverflow for large list like:

choose_elements [1...10_000] 7 ;;

Solution

  • First of all, ocaml does allow you to write several statements in an if then else, it's just not written as in C-like languages.

    if condition then begin
      instruction1 ;
      instruction 2 ;
      (* ... *)
    end
    else begin
      (* ... *)
    end
    

    The begin (* ... *) end block works the same as parentheses, so you can also just parenthesise:

    if condition then (
      instruction1 ;
      instruction 2 ;
      (* ... *)
    )
    else (
      (* ... *)
    )
    

    So you can do your optimisation just fine.

    What happens here is that when writing if b then t else f in ocaml, you are building a value of type T if t : T and f : T. You can for instance write if b then 0 else 1 or if b then "Hello" else "Goodbye". It also works with the unit type (the type of most instructions):

    if b then instruction1 else instruction2
    

    The semicolon operator allows to do two instructions sequentially:

    (;) : unit -> unit -> unit
    

    Note that it is not the same as in most language where it marks the end of an instruction.

    The problem is that when you write

    if b then instruction1 else instruction2 ; instruction 3
    

    it is not understood as

    if b then instruction1 else (instruction2 ; instruction 3)
    

    as you wished, but as

    (if b then instruction1 else instruction2) ; instruction 3
    

    which also makes sense because the if expression also has type unit.