Search code examples
haskellbytestring

Efficiently turn a ByteString into a hex representation


I needed to be able to give the hex representation of a SHA512 hash. Maybe I just didn't look hard enough, but I could find any functions on Hackage to do it. So I wrote an implementation using unfoldrN. It's definitely fast enough for my purposes, but I'm wondering if anyone knows of a faster approach.

I've put my implementation up on Github as a gist: https://gist.github.com/2356925 . The file also includes a simple implementation based on Numeric.showHex, a QuickCheck test and a criterion benchmark. My current results of the simple version versus the unfoldrN version are:

benchmarking simple
mean: 4.677296 ms, lb 4.656011 ms, ub 4.696684 ms, ci 0.950
std dev: 104.2791 us, lb 87.77023 us, ub 128.1627 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) low mild
variance introduced by outliers: 15.195%
variance is moderately inflated by outliers

benchmarking unfoldrN_MS1
mean: 370.0101 us, lb 365.9819 us, ub 373.8619 us, ci 0.950
std dev: 20.17016 us, lb 16.92772 us, ub 24.08982 us, ci 0.950
found 14 outliers among 100 samples (14.0%)
  7 (7.0%) low mild
  7 (7.0%) high mild
variance introduced by outliers: 52.467%
variance is severely inflated by outliers

Anyone want to take a stab at improving it?


Solution

  • Going lower-level,

    import Data.ByteString.Internal
    import Foreign.Ptr
    import Foreign.Storable
    import qualified Data.ByteString as B
    import Data.ByteString.Unsafe
    import Data.Bits
    import Data.Word
    
    maxLen :: Int
    maxLen = maxBound `quot` 2
    
    hexDig :: Word8 -> Word8
    hexDig d
        | d < 10    = d + 48
        | otherwise = d + 87
    
    toHex :: ByteString -> ByteString
    toHex bs
        | len > maxLen = error "too long to convert"
        | otherwise    = unsafeCreate nl (go 0)
          where
            len = B.length bs
            nl  = 2*len
            go i p
                | i == len  = return ()
                | otherwise = case unsafeIndex bs i of
                                w -> do poke p (hexDig $ w `shiftR` 4)
                                        poke (p `plusPtr` 1) (hexDig $ w .&. 0xF)
                                        go (i+1) (p `plusPtr` 2)
    

    I could reduce the conversion time on my box by another factor of about 3.5. Making sample a bit longer (25000), I got

    benchmarking simple
    mean: 13.76532 ms, lb 13.64184 ms, ub 13.88680 ms, ci 0.950
    std dev: 633.2413 us, lb 582.6342 us, ub 687.9701 us, ci 0.950
    variance introduced by outliers: 44.438%
    variance is moderately inflated by outliers
    
    benchmarking unfoldrN_MS1
    mean: 430.5705 us, lb 424.9206 us, ub 438.5689 us, ci 0.950
    std dev: 33.85429 us, lb 26.25623 us, ub 45.74915 us, ci 0.950
    found 4 outliers among 100 samples (4.0%)
      3 (3.0%) high mild
      1 (1.0%) high severe
    variance introduced by outliers: 69.726%
    variance is severely inflated by outliers
    
    benchmarking LowHex
    mean: 123.6000 us, lb 123.0551 us, ub 124.7084 us, ci 0.950
    std dev: 3.837497 us, lb 1.869370 us, ub 6.470112 us, ci 0.950
    found 6 outliers among 100 samples (6.0%)
      4 (4.0%) high mild
      2 (2.0%) high severe
    variance introduced by outliers: 25.818%
    variance is moderately inflated by outliers
    

    For the original 500 long sample, it was

    benchmarking simple
    mean: 2.603306 ms, lb 2.583054 ms, ub 2.629212 ms, ci 0.950
    std dev: 116.5341 us, lb 81.61409 us, ub 191.3293 us, ci 0.950
    found 7 outliers among 100 samples (7.0%)
      2 (2.0%) low severe
      3 (3.0%) low mild
      1 (1.0%) high severe
    variance introduced by outliers: 42.490%
    variance is moderately inflated by outliers
    
    benchmarking unfoldrN_MS1
    mean: 83.19349 us, lb 82.88474 us, ub 83.58283 us, ci 0.950
    std dev: 1.771460 us, lb 1.486104 us, ub 2.174729 us, ci 0.950
    found 14 outliers among 100 samples (14.0%)
      12 (12.0%) high mild
      2 (2.0%) high severe
    variance introduced by outliers: 14.225%
    variance is moderately inflated by outliers
    
    benchmarking LowHex
    mean: 24.50564 us, lb 24.41683 us, ub 24.61241 us, ci 0.950
    std dev: 497.1908 ns, lb 415.6366 ns, ub 609.7594 ns, ci 0.950
    found 5 outliers among 100 samples (5.0%)
      5 (5.0%) high mild
    variance introduced by outliers: 13.256%
    variance is moderately inflated by outliers