Search code examples
arraysclojureclojure-java-interop

Comparing nested data structures containing potentially empty arrays of primitives with clojure.data/diff


I am using clojure.data/diff to compare nested data structures in my unit tests. It worked fine until I bumped into the issue that it (IMO) behaves inconsistently when encountering empty arrays of primitives.

Non-empty arrays of primitives compare fine against vectors containing same type of objects (e.g. doubles). However, empty primitive arrays do not compare equal (in diff function's sense) to an empty vector or even another empty array of the same type.

Here's a repl session showing my problem. I've added some comments.

nerom.nsd.dbserver=> (require '[clojure.data :as cd])
nil
;; this is as I would expect - a vector and a primitive array with 
;; same contents compare equal
nerom.nsd.dbserver=> (cd/diff [[1.1 2.2]] [(double-array [1.1 2.2])])
[nil nil [[1.1 2.2]]]
;; this is inconsistent with the previous - empty double array does 
;; not compare equal to an empty vector
nerom.nsd.dbserver=> (cd/diff [[]] [(double-array [])])
[[nil] [nil] nil]
;; two double arrays with the same contents compare equal
nerom.nsd.dbserver=> (cd/diff [(double-array [1.1 2.2])] [(double-array [1.1 2.2])])
[nil nil [[1.1 2.2]]]
;; except when they are empty, which is IMO inconsistent
nerom.nsd.dbserver=> (cd/diff [(double-array [])] [(double-array [])])
[[nil] [nil] nil]

What could I do to make empty arrays compare equal to an empty vector or at least an empty vector of the same type?


Solution

  • I guess I've found a clue:

    if you take a look at the diff source you would see the following:

    (if (= a b)
        [nil nil a]
        (if (= (equality-partition a) (equality-partition b))
          (diff-similar a b)
          (atom-diff a b)))
    

    two values are first compared with an equality-partition , defined in a protocol EqualityPartition, which returns some keyword equality class

    so if you extend this protocol to primitive arrays, they won't be considered equal:

    (require '[clojure.data :as d])
    

    before:

    user> (d/diff [(double-array [])] [(int-array [])])
    [[nil] [nil] nil]
    

    and then you extend protocol:

    (extend-protocol d/EqualityPartition
      (Class/forName "[D")
      (equality-partition [_] :double-array))
    
    (extend-protocol d/EqualityPartition
      (Class/forName "[I")
      (equality-partition [_] :int-array))
    
    user> (d/equality-partition (double-array []))
    :double-array
    user> (d/equality-partition (int-array []))
    :int-array
    

    after:

    user> (d/diff [(double-array [])] [(int-array [])])
    [[#object["[D" 0x1e2f0f97 "[D@1e2f0f97"]] [#object["[I" 0x207c60b5 "[I@207c60b5"]] nil]
    

    while similar-type arrays will be compared as before

    user> (d/diff [(double-array [])] [(double-array [])])
    [[nil] [nil] nil]
    
    user> (d/diff [(double-array [1])] [(double-array [10])])
    [[[1.0]] [[10.0]] nil]
    

    so looking further to this direction: you can tune the comparison extending this protocol (and other protocols from clojure.data)