Suppose I have one million article entities in my backend with an inst attribute called date, or one million player entities with an int attribute called points. What's a good way to select the 10 latest articles or top-scoring players?
Do I need to fetch the whole millions to the peer and then sort and drop from them?
Yes, you would need to fetch all the data, since there's no index that would help you out here.
I would have created my own "index" and normalized this data. You can have a separate set of N entities where you keep as many as you'd like. You could start with 10, or consider storing 100 to trade some (possibly negligible) speed for more flexibility. This index can be stored on a separate "singleton" entity that you add as part of your schema.
;; The attribute that stores the index
{:db/id #db/id[:db.part/db]
:db/ident :indexed-articles
:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many
:db.install/_attribute :db.part/db}
;; The named index entity.
{:db/id #db/id[:db.part/db]
:db/ident :articles-index}
You can have a database function that does this. Every time you insert a new entity that you want to "index", call this function.
[[:db/add tempid :article/title "Foo]
[:db/add tempid :article/date ....]
[:index-article tempid 10]]
The implementation of index-article could look like this:
{:db/id #db/id[:db.part/user]
:db/ident :index-article
:db/fn #db/fn {:lang "clojure"
:params [db article-id idx-size]
:code (concat
(map
(fn [article]
[:db/retract
(d/entid db :articles-index)
:indexed-articles
(:db/id article)])
(->> (datomic.api/entity db :articles-index)
(sort-by (fn [] ... implement me ... ))
(drop (dec idx-size))))
[[:db/add (d/entid db :articles-index) :indexed-articles article-id]])}}
Disclaimer: I haven't actually tested this function, so it probably contains errors :) The general idea is that we remove any "overflow" entities from the set, and add the new one. When idx-size is 10, we want to ensure that only 9 items are in the set, and we add our new item to it.
Now you have an entity you can lookup from index, :articles-index, and the 10 most recent articles can be looked up from the index (all refs are indexed), without causing a full database read.
;; "indexed" set of articles.
(d/entity db :articles-index)