I am using xodus from Clojure and was evaluating the possibilities to iterate through all key/value pairs in a lazy fashion, like it is common in Clojure.
My initial understanding was that all data access via a Cursor
should happen inside of a readonly Transaction
, as each transaction operates on its own database snapshot.
But if you have created a cursor inside of a transaction, it looks like it is still possible to continue to iterate through the same transaction snapshot after the transaction was ended. In fact, it seems like it is actually possible to still use the cursor even if it was closed.
I guess this is not a safe way to do this because I suspect that at some point the gc will invalidate the snapshot.
Still I am little bit confused about how long exactly a cursor taken inside a specific transaction can be used and I was not able to find the answer in the documentation.
Below is an example in Clojure, demonstrating the fact that the cursor can still be used to retrieve the data after the transaction is finished and after the keys were re-assigned.
Using xodus 1.3.232.
(ns chat-bot.xodus-cursor
(:import [jetbrains.exodus.env Environments StoreConfig TransactionalComputable]
[jetbrains.exodus.bindings IntegerBinding]))
(def store-name "test")
(defn startup []
(Environments/newInstance "cursor-test"))
(defn shutdown [env]
(.close env))
(defn fill [env n base]
(.computeInTransaction
env
(reify TransactionalComputable
(compute [this txn]
(let [store (.openStore env store-name StoreConfig/WITHOUT_DUPLICATES txn)]
(doseq [k (range n)]
(.put store txn (IntegerBinding/intToEntry k) (IntegerBinding/intToEntry (+ base k)))))))))
(defn lazy-cursor [txn cursor has-next]
(lazy-seq
(when has-next
(let [kv [(IntegerBinding/entryToInt (.getKey cursor)) (IntegerBinding/entryToInt (.getValue cursor))]]
(println "realized" kv "txn finished" (.isFinished txn))
(cons kv (lazy-cursor txn cursor (.getNext cursor)))))))
(defn get-seq [env]
(.computeInReadonlyTransaction
env
(reify TransactionalComputable
(compute [this txn]
(let [store (.openStore env store-name StoreConfig/WITHOUT_DUPLICATES txn)]
(with-open [cursor (.openCursor store txn)]
(lazy-cursor txn cursor (.getNext cursor))))))))
(defn do-it []
(let [env (startup)]
(fill env 5 0) ;; put some data into the store
(let [kvs0 (get-seq env)] ;; get the data sequence, not realized yet
(fill env 5 10) ;; override the data
(let [kvs1 (get-seq env)] ;; get the data sequence again
(shutdown env)
[kvs0 kvs1])))) ;; return both original and overridden data sequence
The output would be
(def s (do-it)) ;; sequences are still not realized
s ;; output sequences to realize them
realized [0 0] txn finished true
realized [1 1] txn finished true
realized [2 2] txn finished true
realized [3 3] txn finished true
realized [4 4] txn finished true
realized [0 10] txn finished true
realized [1 11] txn finished true
realized [2 12] txn finished true
realized [3 13] txn finished true
realized [4 14] txn finished true
=> [([0 0] [1 1] [2 2] [3 3] [4 4]) ([0 10] [1 11] [2 12] [3 13] [4 14])]
;; the original and the re-assigned key/value sequence is returned
You can keep read-only transactions unfinished as long as you wish provided you finally finish (abort) them after some time. Not finished transactions prevent from deletion of old data moved by database GC. So the time during which you can keep transactions unfinished depends on your workload: the greater write load, the lesser the time is. E.g., if there are not so many writes and database size increases by 1-2-3% in several hours, then you can keep read-only transactions for hours without any impact to performance. The only drawback is if your application would not be able to gracefully close the database, then on next start it will compute files utilization from scratch, i.e. it will travese in background entire database.