Search code examples
python-3.xlispcosine-similarityhy

Cosine Similarity in linear time using Lisp


The cosine similarity of two lists can be calculated in linear time using a for-loop. I'm curious as to how one would achieve this using a Lisp-like language. Below is an example of my code in Python and Hy (Hylang).

Python:

def cos_sim(A,B):
    import math as _math
    n,da,db,d = 0,0,0,0

    for a,b in zip(A,B):
        n += a*b
        da += a*a
        db += b*b

    da = _math.sqrt(da)
    db = _math.sqrt(db)
    d = da*db

    return n / (d + 1e-32)

Hy (Lisp):

(import math)

(defn l2norm [a]
    (math.sqrt (reduce + (map (fn [s](* s s)) a))))

(defn dot [a b]
   (reduce + (map * a b)))

(defn cossim [a b]
   (/ (dot a b) (* (l2norm a) (l2norm b))))

Solution

  • "I'm curious as to how one would achieve this using a Lisp-like language." This really depends on which Lisp you are using. In Scheme you might do something similar to the posted Hy solution:

    (define (cos-sim-1 u v)
      (/ (dot-prod u v)
         (* (norm u) (norm v))))
    
    (define (dot-prod u v)
      (fold-left + 0 (map * u v)))
    
    (define (norm u)
      (sqrt (fold-left (lambda (acc x) (+ acc (* x x)))
                       0
                       u)))
    

    This is linear in time complexity, but it could be improved by a constant factor by passing over the input only once. Scheme provides a named let construct that can be used to bind a name to a procedure; this is convenient here as it provides a simple mechanism for building the dot product and norms:

    (define (cos-sim-2 u v)
      (let iter ((u u)
                 (v v)
                 (dot-product 0)
                 (U^2 0)
                 (V^2 0))
        (if (null? u)
            (/ dot-product (sqrt (* U^2 V^2)))
            (let ((x (car u))
                  (y (car v)))
              (iter (cdr u)
                    (cdr v)
                    (+ dot-product (* x y))
                    (+ U^2 (* x x))
                    (+ V^2 (* y y)))))))
    

    Both of these procedures assume that the input lists have the same length; it might be useful to add some validation code that checks this. Note that fold-left is standard in R6RS Scheme, but other standards rely on SRFIs for this, and some implementations may use different names, but the fold-left functionality is commonly available (perhaps as foldl or reduce).

    It is possible to solve the problem in Common Lisp using either of the basic methods shown above, though in Common Lisp you would use labels instead of named let. But it would be typical to see a Common Lisp solution using the loop macro. The Common Lisp standard does not guarantee tail call elimination (though some implementations do support that), so explicit loops are seen much more often than in Scheme. The loop macro is pretty powerful, and one way that you could solve this problem while passing over the input lists only once is this:

    (defun cos-sim (u v)
      (loop :for x :in u
            :for y :in v
            :sum (* x y) :into dot-product
            :sum (* x x) :into u2
            :sum (* y y) :into y2
            :finally (return (/ dot-product (sqrt (* u2 y2))))))
    

    Here are some sample interactions:

    Scheme (Chez Scheme):

    > (cos-sim-1 '(1 0 0) '(1 0 0))
    1
    > (cos-sim-1 '(1 0 0) '(-1 0 0))
    -1
    > (cos-sim-1 '(1 0 0) '(0 1 0))
    0
    > (cos-sim-1 '(1 1 0) '(0 1 0))
    0.7071067811865475
    
    > (cos-sim-2 '(1 0 0) '(1 0 0))
    1
    > (cos-sim-2 '(1 0 0) '(-1 0 0))
    -1
    > (cos-sim-2 '(1 0 0) '(0 1 0))
    0
    > (cos-sim-2 '(1 1 0) '(0 1 0))
    0.7071067811865475
    

    Common Lisp:

    CL-USER> (cos-sim '(1 0 0) '(1 0 0))
    1.0
    CL-USER> (cos-sim '(1 0 0) '(-1 0 0))
    -1.0
    CL-USER> (cos-sim '(1 0 0) '(0 1 0))
    0.0
    CL-USER> (cos-sim '(1 1 0) '(0 1 0))
    0.70710677