Search code examples
pythonperformancehaskellio

Haskell program involving `read` is much slower than an equivalent Python one


As part of a programming challenge, I need to read, from stdin, a sequence of space-separated integers (on a single line), and print the sum of those integers to stdout. The sequence in question can contain as many as 10,000,000 integers.

I have two solutions for this: one written in Haskell (foo.hs), and another, equivalent one, written in Python 2 (foo.py). Unfortunately, the (compiled) Haskell program is consistently slower than the Python program, and I'm at a loss for explaining the discrepancy in performance between the two programs; see the Benchmark section below. If anything, I would have expected Haskell to have the upper hand...

What am I doing wrong? How can I account for this discrepancy? Is there an easy way of speeding up my Haskell code?

(For information, I'm using a mid-2010 Macbook Pro with 8Gb RAM, GHC 7.8.4, and Python 2.7.9.)

foo.hs

main = print . sum =<< getIntList

getIntList :: IO [Int]
getIntList = fmap (map read . words) getLine

(compiled with ghc -O2 foo.hs)

foo.py

ns = map(int, raw_input().split())
print sum(ns)

Benchmark

In the following, test.txt consists of a single line of 10 million space-separated integers.

# Haskell
$ time ./foo < test.txt 
1679257

real    0m36.704s
user    0m35.932s
sys     0m0.632s

# Python
$ time python foo.py < test.txt
1679257 

real    0m7.916s
user    0m7.756s
sys     0m0.151s

Solution

  • read is slow. For bulk parsing, use bytestring or text primitives, or attoparsec.

    I did some benchmarking. Your original version ran in 23,9 secs on my computer. The version below ran in 0.35 secs:

    import qualified Data.ByteString.Char8 as B
    import Control.Applicative
    import Data.Maybe
    import Data.List
    import Data.Char
    
    main = print . sum =<< getIntList
    
    getIntList :: IO [Int]
    getIntList =
        map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt"
    

    By specializing the parser to your test.txt file, I could get the runtime down to 0.26 sec:

    getIntList :: IO [Int]          
    getIntList =
        unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt"