Given a list vs
, I want to get the list vs'
of the unique elements of vs
, as well as the indices of the elements of vs
in vs'
. For example, if vs = [7, 8, 7, 8, 9]
I want to get [7,8,9]
(the unique elements) and [0,1,0,1,2]
(the indices).
An easy implementation is:
import Data.List
import Data.Maybe
unique :: Eq a => [a] -> ([a], [Int])
unique vs = (vsnub, indices)
where
vsnub = nub vs
indices = map (\v -> fromJust $ elemIndex v vsnub) vs
Is there a more efficient option?
I've done an implementation of the following pseudo-code (saying that vs
is a list of numbers) with the help of mutable vectors, but this is slow:
n = length of vs
idx = list of n integers (to store the indices)
visited = [false, false, ..., false] (n elements)
nvs = list of n numbers (to store the unique elements)
count = 0
for(i = 0; i < n; ++i)
{
if(not visited[i])
{
nvs[count] = vs[i]
idx[i] = count
visited[i] = true
for(j = i+1; j < n; ++j)
{
if(vs[j] = vs[i])
{
visited[j] = true
idx[j] = count
}
}
count ++
}
}
nvs = first 'count' elements of nvs
Here is my code using mutable vectors (slow):
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad ((>>=))
import Data.Vector.Unboxed (Unbox, Vector, freeze, (!))
import Data.Vector.Unboxed.Mutable (IOVector, new, write)
import qualified Data.Vector.Unboxed.Mutable as VM
unique' :: forall a . (Unbox a, Eq a) => [a] -> IO (Vector a, Vector Int)
unique' vs = do
let n = length vs
idx <- VM.replicate n 0 :: IO (IOVector Int)
visited <- VM.replicate n False :: IO (IOVector Bool)
nvs <- new n :: IO (IOVector a)
let inner :: Int -> Int -> Int -> IO ()
inner i j count | j == n = return ()
| otherwise =
if vs !! i == vs !! j
then do
write visited j True
write idx j count
inner i (j+1) count
else inner i (j+1) count
let go :: Int -> Int -> IO (IOVector a)
go i count | i == n = return $ VM.take count nvs
| otherwise = do
vst <- VM.read visited i
if not vst
then do
write nvs count (vs !! i)
write idx i count
write visited i True
_ <- inner i (i+1) count
go (i+1) (count + 1)
else go (i+1) count
nvs' <- go 0 0 >>= freeze
idx' <- freeze idx
return (nvs', idx')
It's helpful for things like these to focus on what state you actually need to maintain, and then encode that as an extra parameter to your recursive function. For this case, you need:
The first of those can be encapsulated with Map
, and the second with an Int
.
uniqInds :: Ord a => [a] -> ([a], [Int])
uniqInds = go 0 Map.empty
where
go i m [] = ([],[])
go i m (x:xs) = case Map.insertLookupWithKey (\_ _ j -> j) x i m of
(Nothing, m') -> first (x:) (second (i:) (go (i+1) m' xs))
(Just j , m') -> second (j:) (go i m' xs)
Map.insertLookupWithKey
is a function that lets us do the lookup and alter at the same time. first
and second
map a function over the first and second elements in a tuple, respectively. You can actually rewrite this function as a fold:
uniqInds :: Ord a => [a] -> ([a], [Int])
uniqInds xs = foldr f (\_ _ -> ([],[])) xs 0 Map.empty
where
f x xs i m = case Map.insertLookupWithKey (\_ _ j -> j) x i m of
(Nothing, m') -> first (x:) (second (i:) (xs (i+1) m'))
(Just j , m') -> second (j:) (xs i m')