Search code examples
haskellunique

Unique elements of a list and corresponding indices


Given a list vs, I want to get the list vs' of the unique elements of vs, as well as the indices of the elements of vs in vs'. For example, if vs = [7, 8, 7, 8, 9] I want to get [7,8,9] (the unique elements) and [0,1,0,1,2] (the indices).

An easy implementation is:

import           Data.List
import           Data.Maybe

unique :: Eq a => [a] -> ([a], [Int])
unique vs = (vsnub, indices)
  where
  vsnub = nub vs
  indices = map (\v -> fromJust $ elemIndex v vsnub) vs

Is there a more efficient option?

I've done an implementation of the following pseudo-code (saying that vs is a list of numbers) with the help of mutable vectors, but this is slow:

n = length of vs
idx = list of n integers (to store the indices)
visited = [false, false, ..., false] (n elements)
nvs = list of n numbers (to store the unique elements)
count = 0 
for(i = 0; i < n; ++i)
{
  if(not visited[i])
  {
    nvs[count] = vs[i]
    idx[i] = count
    visited[i] = true
    for(j = i+1; j < n; ++j)
    {
      if(vs[j] = vs[i])
      {
        visited[j] = true
        idx[j] = count
      }
    }
    count ++
  }
}
nvs = first 'count' elements of nvs

EDIT

Here is my code using mutable vectors (slow):

{-# LANGUAGE ScopedTypeVariables #-}
import           Control.Monad               ((>>=))
import           Data.Vector.Unboxed         (Unbox, Vector, freeze, (!))
import           Data.Vector.Unboxed.Mutable (IOVector, new, write)
import qualified Data.Vector.Unboxed.Mutable as VM

unique' :: forall a . (Unbox a, Eq a) => [a] -> IO (Vector a, Vector Int)
unique' vs = do
  let n = length vs
  idx <- VM.replicate n 0 :: IO (IOVector Int)
  visited <- VM.replicate n False :: IO (IOVector Bool)
  nvs <- new n :: IO (IOVector a)
  let inner :: Int -> Int -> Int -> IO ()
      inner i j count | j == n = return ()
                      | otherwise =
                        if vs !! i == vs !! j
                          then do
                            write visited j True
                            write idx j count
                            inner i (j+1) count
                          else inner i (j+1) count
  let go :: Int -> Int -> IO (IOVector a)
      go i count | i == n = return $ VM.take count nvs
                 | otherwise = do
                   vst <- VM.read visited i
                   if not vst
                     then do
                       write nvs count (vs !! i)
                       write idx i count
                       write visited i True
                       _ <- inner i (i+1) count
                       go (i+1) (count + 1)
                     else go (i+1) count
  nvs' <- go 0 0 >>= freeze
  idx' <- freeze idx
  return (nvs', idx')

Solution

  • It's helpful for things like these to focus on what state you actually need to maintain, and then encode that as an extra parameter to your recursive function. For this case, you need:

    • The elements you've seen so far (and their positions).
    • The position in the output list you're at.

    The first of those can be encapsulated with Map, and the second with an Int.

    uniqInds :: Ord a => [a] -> ([a], [Int])
    uniqInds = go 0 Map.empty
      where
        go i m [] = ([],[])
        go i m (x:xs) = case Map.insertLookupWithKey (\_ _ j -> j) x i m of
            (Nothing, m') -> first (x:) (second (i:) (go (i+1) m' xs))
            (Just j , m') ->             second (j:) (go i     m' xs)
    

    Map.insertLookupWithKey is a function that lets us do the lookup and alter at the same time. first and second map a function over the first and second elements in a tuple, respectively. You can actually rewrite this function as a fold:

    uniqInds :: Ord a => [a] -> ([a], [Int])
    uniqInds xs = foldr f (\_ _ -> ([],[])) xs 0 Map.empty
      where
        f x xs i m = case Map.insertLookupWithKey (\_ _ j -> j) x i m of
            (Nothing, m') -> first (x:) (second (i:) (xs (i+1) m'))
            (Just j , m') ->             second (j:) (xs i     m')