I have been getting my hands around coding in Haskell, but couldn't grasp the idea of implementing union function. I have also found some function definition embedded inside the Haskell platform. but the problem is I need a neat and understandable way to make it work. Anyone can help me with that?
Assuming you're talking about union :: Eq a => [a] -> [a] -> [a]
which takes two input lists and returns a third list which contains all of the elements of each argument list, then it's defined in Data.List
which is in the base
package.
In the source it's divided into two functions, the generalized function unionBy
which takes a custom definition of equality (a function of type equal to (==) :: a -> a -> Bool
) and then defines the one that uses the Eq
typeclass by passing in (==)
as a concrete implementation of equality.
union :: (Eq a) => [a] -> [a] -> [a]
union = unionBy (==)
We can substitute (==)
into the unionBy
code, though, as Haskell lets us use equational reasoning.
union = unionBy (==)
-- so...
union :: Eq a => [a] -> [a] -> [a]
union xs ys = xs ++ foldl (flip (deleteBy (==))) (nubBy (==) ys) xs
This same pattern occurs twice more in the definition of unionBy
in deleteBy
and nubBy
, both of which follow the same convention. delete
removes an element from a list and nub
returns a list of unique elements. We'll simplify the definition again to eliminate all traces of (==)
and simply assume that the elements a
have Eq
defined.
union xs ys = xs ++ foldl (flip delete) (nub ys) xs
Now the definition is perhaps more readable. The union
of xs
and ys
is xs
appended to the unique ("nub
bed") values of ys
which have been processed by foldl (flip delete) _ xs
. The net result of that foldl
is to one by one try to delete
each element of xs
from (nub ys)
. What that ends up meaning is that union xs ys
is xs
appended to each unique element from ys
less those in xs
.
As an aside, with this source in hand we can notice some quirky behavior of union
such as how it treats duplicates in the first argument differently from the second argument
union [1,1,2] [2] == [1,1,2]
union [2] [1,1,2] == [2,1]
which is a bit of a letdown, a result of using []
to represent a Set
-like notion of union
. However, if we view the results using Set.fromList
then we're fine.
xs, ys :: Eq a => [a]
Set.fromList (xs `union` ys) == Set.fromList xs `Set.union` Set.fromList ys
which also gives us another definition of union
union xs ys = Set.toList (Set.fromList xs `Set.union` Set.fromList ys)
So how does that foldl
trick work? Let's unpack the definition of foldl
to see, again abusing equational reasoning.
union xs ys = xs ++ (case xs of
[] -> nub ys
(x:xs') -> foldl (flip delete) (delete x (nub ys)) xs'
)
which should make the trick more evident—it cycles over the elements of xs
, deleting them one by one from (nub ys)
.
While hopefully this helped to make the code in union
a bit more clear, the real take home should be that equational reasoning is a powerful tool for dissecting Haskell code. Don't be afraid to simplify code directly by manually inlining the definition of a function.