I'm implementing patterns mining algorithm, and usually input data are file with the following format
item1 item2 item3
item0 item3 item10
....
item30 item40 item30
where usually itemx
is a String
. To be efficient, I used to read the file with ByteString
which is faster than the default String
. Since the great task in patterns mining algorithms is comparison between items sets. I wonder How faster or slower my program will be if I change the input file format in order to make comparison between Int
instead of comparison between ByteString
. Here is the novel format :
1 2 3
0 3 10
....
30 40 30
thanks !
If you restrict yourself to just asking whether the equality function on Int - given by the eqInt#
primop - is faster than the equality function on bytestrings -
primop IntEqOp "==#" Compare
Int# -> Int# -> Bool
with commutable = True
vs
eq :: ByteString -> ByteString -> Bool
eq a@(PS fp off len) b@(PS fp' off' len')
| len /= len' = False -- short cut on length
| fp == fp' && off == off' = True -- short cut for the same string
| otherwise = compareBytes a b == EQ
{-# INLINE eq #-}
Then the Int case will be faster. No doubt.
However, if you have to parse your bytestring input (or String input) into Int
tokens first, you might lose.
The only way to really know here is to measure.