Search code examples
erlangmnesia

Mnesia: how the bag table type is implemented?


I have a table with an integer key (timestamp) contains the time when particular record should be removed from a database. There is also a cleaning query, which takes from this table records with expiration time less then now and removes them.

Erlang documentation says, that there are four types of table types: set, ordered_set, bag, and duplicate_bag.

  • set is implemented using hash tables, so reading takes O(1) time complexity.
  • ordered_set is implemented using tree, so reading takes O(log(n)) time complexity, but it better works with consequent intervals.
  • I found no information about bag implementation.

ordered_set seems ideal, but I can't use it because two records can have the same timestamp. So the question is:

How the bag table is implemented and is it good with querying consequent intervals? If not, how can I get "ordered_bag" functionality?


Solution

  • Mnesia's bag is implemented using ETS and DETS, so as other table types [1]. Also, Mnesia does not support duplicate_bag tables - you can see it from ducumentation [2]. Thus, we can conclude bag in Mnesia is implemented as hash table and has constant lookup time, since ETS and DETS bag is implemented as hash table [3]. [4] also says that set and bag are implemented as hash tables in Mnesia.

    1. Learn You Some Erlang
    2. Erlang -- mnesia:create_table/2
    3. Erlang Programming by Fransecso Cesarini and Simon Thompson, Ch.10
    4. Erlang and OTP in Action by Martin Logan, Eric Merritt, and Richard Carlsson, Ch.9

    On the rest of the question:

    No, bag is not good with querying consequent intervals. To get an interval from bag table you must fully traverse it. I see two possible decisions to that.

    First, you can use additional ordered_set table to keep order, as @niahoo suggested. Thus, you will be able to efficiently query all timestamps that fall in an interval, and then delete corresponding entries from your bag table, which also will be efficient, since you will know all keys by this point.

    Second, you can use ordered_set of {timestamp, [values]}. This will require additional manual job on inserting and deleting single entry, but it will save you from creating additional table if you only need to query them grouped by timestamp.