I've been trying to solve problem 1330 from acm.timus.ru in Haskell. Basically, it boils down to this: 1) read from stdin an array A of length N (N < 10^4) and M pairs of integers (M < 10^5); 2) for each (from, to) pair, print the sum of subarray A[from..to] to stdout.
Since SO won't let me post more than 2 URLs as part of this question, I will refer to files in my Github repository below.
I came up with two solutions, which share most of the code. The first one (1330_slow.hs) uses Prelude functions (getLine/read/words) and is somewhat slow:
$ ./bench.sh slow_hs
slow_hs
Time inside the program: 2.18
MD5 (output.slow_hs.txt) = 89bcf8fd69a7fce953595d329c8f033a
The other solution (1330.hs) ditches these functions, replacing them with their Data.ByteString.Char8 equivalents (B.getLine/B.readInt/B.words), and performs decently well:
$ ./bench.sh hs
hs
Time inside the program: 0.27
MD5 (output.hs.txt) = 89bcf8fd69a7fce953595d329c8f033a
The time limit on this problem is 500 ms, so while 270 ms is fast enough (and comparable to my solutions in other languages, such as C++ and Go), 2180 ms doesn't cut it. So why is my first solution so ridiculously slow? Even by following the profiling tips from Real World Haskell I still can't make sense of this (all I could figure out was that the majority of time was spent in readIntPair function, which didn't help much).
If you want to do some testing of your own, I have a Python input generator (gen_test.py), and a pre-generated input file (input.txt) in case you don't have Python installed. And a diff (slow_fast_diff.txt) between the two solutions.
As others have said, it's not that ByteString
is fast, it's that String
is very, very slow.
A ByteString
stores one byte per character, plus some book-keeping overhead. A String
stores something like 12 bytes per character (depending on whether you're running in 32-bit or 64-bit mode). It also stores each character in non-contiguous memory, so each character has to have space individually allocated to it, individually scanned by the garbage collector, and eventually individually deallocated again. This means poor cache locality, lots of allocator time, and lots of garbage collection time. In short, it's hellishly inefficient.
Basically, ByteString
does what C does, what Java does, what C++ does, what C# does, what VB does, and what just about every other programming language does with strings. No other language I'm aware of has a default string type as inefficient as Haskell does. (Even Frege, which is a Haskell dialect, uses a more efficient string type.)
I should point out that ByteString.Char8
only handles Latin-1 characters. It doesn't cope with random Unicode characters at all. That probably isn't a problem for a programming challenge like this, but for a "real system" it might well be. ByteString
doesn't really deal with exotic characters or different character encodings or anything; it just assumes you want plain ASCII. That used to be a safe assumption; today, not so much.