Search code examples
performancehaskellcomparisonbytestring

How faster Int comparison is than ByteString comparison in Haskell?


I'm implementing patterns mining algorithm, and usually input data are file with the following format

item1 item2 item3
item0 item3 item10
....
item30 item40 item30

where usually itemx is a String. To be efficient, I used to read the file with ByteString which is faster than the default String. Since the great task in patterns mining algorithms is comparison between items sets. I wonder How faster or slower my program will be if I change the input file format in order to make comparison between Int instead of comparison between ByteString. Here is the novel format :

1 2 3
0 3 10
....
30 40 30

thanks !


Solution

  • If you restrict yourself to just asking whether the equality function on Int - given by the eqInt# primop - is faster than the equality function on bytestrings -

    primop   IntEqOp  "==#"   Compare
       Int# -> Int# -> Bool
       with commutable = True
    

    vs

    eq :: ByteString -> ByteString -> Bool
    eq a@(PS fp off len) b@(PS fp' off' len')
      | len /= len'              = False    -- short cut on length
      | fp == fp' && off == off' = True     -- short cut for the same string
      | otherwise                = compareBytes a b == EQ
    {-# INLINE eq #-}
    

    Then the Int case will be faster. No doubt.

    However, if you have to parse your bytestring input (or String input) into Int tokens first, you might lose.

    The only way to really know here is to measure.