I wrote something using Data.List.groupBy
. It didn't work as expected so I end up writting my own version of groupBy
: after all I'm not sure that the Data.List
one is supposed to do (there is no real documentation).
Anyway my tests passed with my version of groupBy
whereas it fails with the Data.List
.
I found (thanks quickcheck
) a case where the two function behaves differently, and I still don't understand why there is a difference between the two versions. Is the Data.List
version buggy or is is mine ? (Of course mine is a naive implementation and is probably not the most efficient way to do so).
Here is the code :
import qualified Data.List as DL
import Data.Function (on)
import Test.QuickCheck
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' eq (x:xs) = xLike:(groupBy' eq xNotLike) where
xLike = x:[ e | e <- xs, x `eq` e ]
xNotLike = [ e | e <- xs, not $ x `eq` e ]
head' [] = Nothing
head' (x:xs) = Just x
prop_a s = (groupBy' by s) == (DL.groupBy by s) where
types = s :: [String]
by = (==) `on` head'
running in ghc quickCheck prop_a
returns ["", "a", ""]
*Main> groupBy' ((==) `on` head') ["","a",""]
[["",""],["a"]] # correct in my opinion
*Main> DL.groupBy ((==) `on` head') ["","a",""]
[[""],["a"],[""]] # incorrect.
What's happening ? I can't believe there is a bug in the haskell-platform .
Your version is O (n2) – which can be unacceptably slow in real-world use1.
The standard version avoids this by only grouping adjacent elements if they are equivalent. Hence,
*Main> groupBy ((==) `on` head') ["", "", "a"]
will yield the result you're after.
A simple way to obtain "universal grouping" with groupBy
is to first sort the list if that's feasible for the data type.
*Main> groupBy ((==) `on` head') $ DL.sort ["", "a", ""]
The complexity of this is only O (n log n).
1 This didn't prevent the committee from specifying nub
as O (n2)...