Search code examples
functional-programmingschemelisppurely-functionalmutability

When is it okay to modify a variable in functional languages?


So I'm teaching myself functional programming using Racket Scheme, and I love it so far. As an exercise for myself, I've been trying to implement a few simple tasks in a purely functional way. I know immutability is an important part of functional style, but I was wondering if there are any times when it's okay.

I thought of a fun way for a function to strip non-unique strings from a list of strings when used with filter, shown here:

(define (make-uniquer)
  (let ([uniques '()])
    (lambda (x)
      (if (not (member x uniques))
          (set! uniques (cons x uniques))
          #f))))

(define (uniquify x)
  (let ([uniquer (make-uniquer)])
    (filter uniquer x)))

As you can see, make-uniquer returns a closure over a list of strings to compare against for uniqueness, this way it can act as a simple predicate for filter. But I'm destructively updating the closed-over list. Is this bad form, or is it okay to change local closed-over variables in such ways?


Solution

  • In this case, I'd avoid a mutable implementation because a functional implementation can compete pretty well in terms of performance. Here are three versions (including the built-in remove-duplicates) of the function:

    #lang racket
    
    (define (make-uniquer)
      (let ([uniques (make-hash)])
        (lambda (x)
          (if (not (hash-ref uniques x #f))
              (hash-set! uniques x #t)
              #f))))
    
    (define (uniquify x)
      (let ([uniquer (make-uniquer)])
        (filter uniquer x)))
    
    (define (uniquify-2 lst)
      (define-values (_ r)
       (for/fold ([found (hash)] [result '()])
                 ([elem (in-list lst)])
         (cond [(hash-ref found elem #f)
                (values found result)]
               [else (values (hash-set found elem #t)
                             (cons elem result))])))
      (reverse r))
    
    (define randoms (build-list 100000 (λ (n) (random 10))))
    (time (for ([i 100]) (uniquify randoms)))
    (time (for ([i 100]) (uniquify-2 randoms)))
    (time (for ([i 100]) (remove-duplicates randoms)))
    
    ;; sanity check
    (require rackunit)
    (check-equal? (uniquify randoms) (uniquify-2 randoms))
    (check-equal? (uniquify randoms) (remove-duplicates randoms))
    

    The times I get for this are

    cpu time: 1348 real time: 1351 gc time: 0
    cpu time: 1016 real time: 1019 gc time: 32
    cpu time: 760 real time: 760 gc time: 0
    

    Not scientific numbers, so take this with a grain of salt. To be fair, I did tune uniquify-2 a bit since my first version was slower. I also improved the mutable version with a hash table, but maybe there are other optimizations that could be applied. Also, the built-in remove-duplicates is tuned for performance and does use a mutable data structure (though it avoids set!).

    You may also be interested in the Guide entry on performance. It points out that use of set! can harm performance, so use it with care.