Search code examples
clojuremapdb

Using Clojure's data structure with MapDB


I tried to use directly Clojure's hashmap with MapDB and ran into weird behaviour. I checked Clojure and MapDB sources and couldn't understand the problem.

First everything looks fine:

lein try org.mapdb/mapdb "1.0.6"

; defining a db for the first time
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
                .closeOnJvmShutdown
                .compressionEnable
                .make))
(defonce fruits (.getTreeMap db "fruits-store"))
(do (.put fruits :banana {:qty 2}) (.commit db))

(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> 2
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> true

CTRL-D
=> Bye for now!

Then I try to access the data again:

lein try org.mapdb/mapdb "1.0.6"

; loading previsously created db
(import [org.mapdb DB DBMaker])
(defonce db (-> (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
                .closeOnJvmShutdown
                .compressionEnable
                .make))
(defonce fruits (.getTreeMap db "fruits-store"))

(get fruits :banana)
=> {:qty 2}
(:qty (get fruits :banana))
=> nil
(first (keys (get fruits :banana)))
=> :qty
(= :qty (first (keys (get fruits :banana))))
=> false
(class (first (keys (get fruits :banana))))
=> clojure.lang.Keyword

How come the same keyword be different with respect to = ? Is there some weird reference problem happening ?


Solution

  • The problem is caused by the way equality of keywords works. Looking at the implementation of the = function we see that since keywords are not clojure.lang.Number or clojure.lang.IPersistentCollection their equality is determined in terms of the Object.equals method. Skimming the source of clojure.lang.Keyword we learn that keywords don't not override Object.equals and therefore two keywords are equal iff they are the same object.

    The default serializer of MapDB is org.mapdb.SerializerPojo, a subclass of org.mapdb.SerializerBase. In its documentation we can read that it's a

    Serializer which uses ‘header byte’ to serialize/deserialize most of classes from ‘java.lang’ and ‘java.util’ packages.

    Unfortunately, it doesn't work that well with clojure.lang classes; It doesn't preserve identity of keywords, thus breaking equality. In order to fix it let's attempt to write our own serializer using the EDN format—alternatively, you could consider, say, Nippy—and use it in our MapDB.

    (require '[clojure.edn :as edn])
    
    (deftype EDNSeralizer []
      ;; See docs of org.mapdb.Serializer for semantics.
      org.mapdb.Serializer
      (fixedSize [_]
        -1)
      (serialize [_ out obj]
        (.writeUTF out (pr-str obj)))
      (deserialize [_ in available]
        (edn/read-string (.readUTF in)))
      ;; MapDB expects serializers to be serializable.
      java.io.Serializable)
    
    (def edn-serializer (EDNSeralizer.))
    
    (import [org.mapdb DB DBMaker])
    (def db (.. (DBMaker/newFileDB (java.io.File. "/tmp/mapdb"))
                closeOnJvmShutdown
                compressionEnable
                make))
    
    (def more-fruits (.. db
                         (createTreeMap "more-fruits")
                         (valueSerializer (EDNSeralizer.))
                         (makeOrGet)))
    (.put more-fruits :banana {:qty 2})
    (.commit db)
    

    Once the more-fruits tree map is reopened in a JVM with EDNSeralizer defined the :qty object stored inside will be the same object as any other :qty instance. As a result equality checks will work properly.