Search code examples
haskellfoldinfinite-sequence

Remove consecutive duplicates from an infinite list via folding?


Consider one of these implementations of a function to remove consecutive duplicates from a list:

uniq :: Eq a => [a] -> [a]
uniq [] = []
uniq (x:xs) = x:uniq (dropWhile (x ==) xs)
uniq :: Eq a => [a] -> [a]
uniq (x:xs@(y:_))
 | x == y    = x:tail (uniq xs)
 | otherwise = x:uniq xs
uniq l = l

They both work as expected on both finite and infinite lists. To be more specific, for infinite lists, I expect take n $ uniq l to terminate as long as there are no infinitely long sequences of duplicate values before there are n values to return.

Now consider this attempt at such a function, using foldr:

uniq :: (Foldable t, Eq a) => t a -> [a]
uniq = foldr uniqHelper []
 where uniqHelper x [] = [x]
       uniqHelper x acc@(y:_)
        | x == y    =   acc
        | otherwise = x:acc

This works correctly on finite lists, but never terminates for infinite ones, because uniqHelper always needs to evaluate its second argument. Is this something that can be fixed with a more clever uniqHelper, or is it inherently impossible to use folding for this task?


Solution

  • You can transfer the removal of an element to the tail, so regardless of the value, we first "yield" x, and then we use a function (here tl) to evaluate the tail of the list:

    uniq :: (Foldable t, Eq a) => t a -> [a]
    uniq = foldr uniqHelper []
     where uniqHelper x acc = x : tl acc
               where tl acc@(x2:xs) | x == x2 = xs
                                    | otherwise = acc
                     tl [] = []

    We thus first yield x, and then worry about the next element later. If second parameter (the folded list of the tail of the list) contains the same element, then we "remove" it, otherwise, we retain the entire list.

    The above is also able to yield the first element of a repeat, for example:

    Prelude> head (uniq (repeat 1))
    1
    Prelude> take 5 $ uniq [1..]
    [1,2,3,4,5]
    Prelude> uniq [1,1,1,1,2,2,2,1,1,1,1,1,1,3,3,3,3,1,1]
    [1,2,1,3,1]
    

    Of course if there are an infinite amount of 0s followed by a 1, that 1 is never emitted:

    Prelude> uniq (repeat 0 ++ [1])
    [0