Union Function in Haskell

I have been getting my hands around coding in Haskell, but couldn't grasp the idea of implementing union function. I have also found some function definition embedded inside the Haskell platform. but the problem is I need a neat and understandable way to make it work. Anyone can help me with that?

Solution

Assuming you're talking about union :: Eq a => [a] -> [a] -> [a] which takes two input lists and returns a third list which contains all of the elements of each argument list, then it's defined in Data.List which is in the base package.

In the source it's divided into two functions, the generalized function unionBy which takes a custom definition of equality (a function of type equal to (==) :: a -> a -> Bool) and then defines the one that uses the Eq typeclass by passing in (==) as a concrete implementation of equality.

union                   :: (Eq a) => [a] -> [a] -> [a]
union                   = unionBy (==)

We can substitute (==) into the unionBy code, though, as Haskell lets us use equational reasoning.

union = unionBy (==)

-- so...

union       :: Eq a => [a] -> [a] -> [a]
union xs ys =  xs ++ foldl (flip (deleteBy (==))) (nubBy (==) ys) xs

This same pattern occurs twice more in the definition of unionBy in deleteBy and nubBy, both of which follow the same convention. delete removes an element from a list and nub returns a list of unique elements. We'll simplify the definition again to eliminate all traces of (==) and simply assume that the elements a have Eq defined.

union xs ys = xs ++ foldl (flip delete) (nub ys) xs

Now the definition is perhaps more readable. The union of xs and ys is xs appended to the unique ("nubbed") values of ys which have been processed by foldl (flip delete) _ xs. The net result of that foldl is to one by one try to delete each element of xs from (nub ys). What that ends up meaning is that union xs ys is xs appended to each unique element from ys less those in xs.

As an aside, with this source in hand we can notice some quirky behavior of union such as how it treats duplicates in the first argument differently from the second argument

union [1,1,2] [2] == [1,1,2]
union [2] [1,1,2] == [2,1]

which is a bit of a letdown, a result of using [] to represent a Set-like notion of union. However, if we view the results using Set.fromList then we're fine.

xs, ys :: Eq a => [a]
Set.fromList (xs `union` ys) == Set.fromList xs `Set.union` Set.fromList ys

which also gives us another definition of union

union xs ys = Set.toList (Set.fromList xs `Set.union` Set.fromList ys)

So how does that foldl trick work? Let's unpack the definition of foldl to see, again abusing equational reasoning.

union xs ys = xs ++ (case xs of
  []      -> nub ys
  (x:xs') -> foldl (flip delete) (delete x (nub ys)) xs'
  )

which should make the trick more evident—it cycles over the elements of xs, deleting them one by one from (nub ys).

While hopefully this helped to make the code in union a bit more clear, the real take home should be that equational reasoning is a powerful tool for dissecting Haskell code. Don't be afraid to simplify code directly by manually inlining the definition of a function.