Search code examples
schememit-scheme

Why is MIT-Scheme so slow in writing out data?


In a larger program I'm writing out a small set (10^7) of numerical digits (0...9). This goes very slow with MIT-Scheme 10.1.10 on a 2.6GHz CPU, taking something like 2 minutes.

Probably I'm doing something wrong, like no buffering, but I'm pretty stuck after reading the reference guide. I reduced everything to the bare minimum:

(define (write-stuff port)
  (define (loop cnt)
    (if (> cnt 0)
        (begin (display "0" port)
               (loop (- cnt 1)))))
  (loop 10000000))

(call-with-output-file "tmp.txt" write-stuff)

Any hints would be welcome...

[EDIT] To make things clear: the data-entries are unrelated to each other, and are stored in a 2D vector. They can be considered random, so I don't like to group them (it's either one-by-one or all-at-once). You can consider the data to be defined by something like

(define (data width height)
  (make-initialized-vector width (lambda (x) 
    (make-initialized-vector height (lambda (x) 
      (list-ref (list #\0 #\1) (random 2)))))))

Apparently, the kernel/user-switch takes much time, so it's best to transform this to 1 string and write it out in 1 shot like @ceving suggested. Then it works fast enough for me, even though it's still 20s for 16MB.

(define (data->str data)
  (string-append* (vector->list (vector-map vector->string data))))

(define dataset (data 4096 4096))

(call-with-output-file "test.txt" (lambda (p) 
  (display (data->str dataset) p)))

Solution

  • The problem is not that MIT-Scheme is so slow. The problem is, that you call the kernel function write excessively. Your program switches for every character from user mode to kernel mode. This takes much time. If you do the same in Bash it takes even longer.

    Your Scheme version:

    (define (write-stuff port)
      (define (loop cnt)
        (if (> cnt 0)
            (begin (display "0" port)
                   (loop (- cnt 1)))))
      (loop 10000000))
    (call-with-output-file "mit-scheme-tmp.txt" write-stuff)
    (exit)
    

    The wrapper to run the Scheme version:

    #! /bin/bash
    mit-scheme --quiet --load mit-scheme-implementation.scm
    

    On my system it takes about 1 minute:

    $ time ./mit-scheme-implementation 
    
    real    1m3,981s
    user    1m2,558s
    sys     0m0,740s
    

    The same for Bash:

    #! /bin/bash
    : > bash-tmp.txt
    n=10000000
    while ((n > 0)); do
      echo -n 0 >> bash-tmp.txt
      n=$((n - 1))
    done
    

    takes 2 minutes:

    $ time ./bash-implementation 
    
    real    2m25,963s
    user    1m33,704s
    sys     0m50,750s
    

    The solution is: do not execute 10 million kernel mode switches.

    Execute just one (or at least 4096 times fewer):

    (define (write-stuff port)
      (display (make-string 10000000 #\0) port))
    
    (call-with-output-file "mit-scheme-2-tmp.txt" write-stuff)
    (exit)
    

    And the program requires just 11 seconds.

    $ time ./mit-scheme-implementation-2 
    
    real    0m11,390s
    user    0m11,270s
    sys     0m0,096s
    

    This is the reason why buffering has been invented in the C library: https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html#Stream-Buffering