Search code examples
listfunctional-programmingpattern-matchingocaml

find longest repeating sequence in list


If I have a List like this:

[i;i;i;a;b;b;a;i;i;c] 
(*the longest repeating sequence would be [i;i]*)
[i;i;i;i] 
(*here the max_pattern would be [i;i] (has to repeat, no overlapping*)

[t;f;f;t] 
(*here it would be [t] as t is, in this case, 
the first element that has a repeating pattern in the list*)

my idea:

  • take the first element from the list and divide the list- where list_one has all the elements to the left of the first element. and list_two all the elements to the right.

  • then check if the element has a match in either one of the two lists.

  • if it does set the current max to the element.

  • now concatenate the next element to the right of the current element in the original list to the current element and again look in list_one and list_two if there is a match.

  • after the length of the concatenation reaches a point where it is >(size_of list / 2) stop.

  • now go to the first step but with the next element and repeat until every element in the list is checked.

example:

[t;f;f;t]
(*first round*)
[t][][f;f;t]
(*match in last elem*)
current_max = [t]
(*second round*)
[t;f][][f;t]
(*from here on no more matches*)
(*go to the next element, split lists again, and proceed with mentioned steps*)
[f][t][f;t]
(*match in f*) 
(*repeat from here on...*) 

I don't know if this algorithm has flaws. I am trying to implement this in OCaml but I think there might be an easier way to do this.


Solution

  • I'm not sure I understand the problem based on your examples. If you're trying to find sequences of repeating values, this is pretty straightforward. Let's look at a way to solve it in terms of List.fold_left.

    List.fold_left 
      (fun (max_seq, cur_seq) x ->
         match (max_seq, cur_seq) with
         (* Set the info up on the first iteration. *)
         | None, None -> (Some (1, x), Some (1, x))
         (* These should basically never occur. *)
         | None, Some (cur_len, cur_val) -> (cur_seq, cur_seq)
         | Some (max_len, max_val), None -> (max_seq, max_seq)
         (* Where the real magic happens. *)
         | Some (max_len, max_val), Some (cur_len, cur_val) -> 
           if x = cur_val && cur_len >= max_len then
             let new_val = Some (cur_len + 1, cur_val) in
             (new_val, new_val)
           else if x = cur_val then   
             (max_seq, Some (cur_len + 1, cur_val))
           else
             (max_seq, Some (1, x))
       )
      (None, None)
      [1; 1; 5; 7; 2; 2; 2; 2; 2; 2; 8; 8; 1; 1; 1; 1; 1]
    

    Result:

    (Some (6, 2), Some (5, 1))
    

    Because we need to deal with the prospect of an empty list, we're going to use the option type for the tuples representing the maximum sequence observed, and the current sequence we're tracking.

    Given that when we observe both values being None, we set both the max and current sequences to the only value observed so far with a sequence length of 1, the next two cases are basically just there to ensure exhaustive matching:

         | None, Some (cur_len, cur_val) -> (cur_seq, cur_seq)
         | Some (max_len, max_val), None -> (max_seq, max_seq)
    

    The real magic happens here:

         | Some (max_len, max_val), Some (cur_len, cur_val) -> 
           if x = cur_val && cur_len >= max_len then
             let new_val = Some (cur_len + 1, cur_val) in
             (new_val, new_val)
           else if x = cur_val then   
             (max_seq, Some (cur_len + 1, cur_val))
           else
             (max_seq, Some (1, x))
    

    As we fold over each value in the list:

    1. If it's a continuation of the current sequence, and the length is either the same or greater than the max sequence, then we have a new max sequence.

    2. Otherwise, we have a continuation of the sequence, but it's not a new max.

    3. Otherwise, we have a new sequence to track.

    The end result will give use two values representing the max sequence length and value, and the current sequence and value. We can use pattern matching to extract this information and strip out what we need.

    For instance:

    let max_seq lst =
      let (max, _) = List.fold_left 
        (fun (max_seq, cur_seq) x ->
           match (max_seq, cur_seq) with
           | None, None -> (Some (1, x), Some (1, x))
           | None, Some (cur_len, cur_val) -> (cur_seq, cur_seq)
           | Some (max_len, max_val), None -> (max_seq, max_seq)
           | Some (max_len, max_val), Some (cur_len, cur_val) -> 
             if x = cur_val && cur_len >= max_len then
               let new_val = Some (cur_len + 1, cur_val) in
               (new_val, new_val)
             else if x = cur_val then   
               (max_seq, Some (cur_len + 1, cur_val))
             else
               (max_seq, Some (1, x))
         )
        (None, None)
        lst
      in
      max
    

    And now we can simply call it on a list.

    # max_seq [1; 1; 5; 7; 2; 2; 2; 2; 2; 2; 8; 8; 1; 1; 1; 1; 1];;
    - : (int * int) option = Some (6, 2)
    

    As a newcomer to the language, if it helps you to understand List.fold_left, it's a very easy function to implement, and the implementation is often useful to see when trying to wrap your brain around it. I'll call my version foldl.

    let rec foldl f init lst =
      match lst with
      | [] -> init
      | x::xs -> foldl f (f init x) xs