Search code examples
arangodbarangojs

Which method of storing this particular data would be more efficient?


I have an existing database for a game - currently using MySQL to store information - and I wish to test out ArangoDB to compare speed.

I am wondering whether it would be better to store all of a player's information in a single collection or if it would be more efficient (or even just better practice) to separate the two.

In MySQL I didn't really have a choice, but using ArangoDB I do.

For example, storing inventory information in MySQL:

    +---------------------------+
    | user_id | item_id | count |
    +---------+---------+-------+
    |       1 |       1 |     7 |
    |       1 |       2 |     4 |
    +---------+---------+-------+

Or in ArangoDB, I could do either:

  1. A single collection for all of the information:

    {
        _key: "Unique User ID",
        health: 100,
        money: 52.38,
        // .... ,
        inventory:
            {
                item1: 7,
                item2: 4
                // , ....
            }
    }
    
  2. Separate the above collection into two different ones (one for health, money, etc. and one for inventory data):

    // 'user' collection
    {
        _key: "Unique User ID",
        health: 100,
        money: 52.38,
        ....
    }
    
    // 'inventory' collection
    {
        _key: "Unique User ID",
        item1: 7,
        item2: 4
        // , ....
    }
    

Which one of the two methods above (or even another I didn't think of) would be more efficient?


Solution

  • Well, the answer depends on how you expect the data being used by your application. Here are some pointers though on how I would choose how to store the data:

    • Does the inventory data change infrequently and is it usually needed when accessing user information? If that's the case, then you would probably best of with option 1 (storing the inventory data with the user information).

    • Do you want to make the transition from relational to document storage as painless as possible? Then you can probably use option 2 with separate collections. Arangodb supports joins across collection and it works pretty well.

    • Now, if you do not mind a bit of a learning curve I would suggest option 3: graph traversals. In this case you would store the data in a way similar to the following:

      //user vertex collection

      { _key: "1", health: 100, money: 52.38, .... }

      // item vertex collection

      { itemID: 1, // , .... },

      { itemID: 2 // , .... }

      // inventory edge collection

      { _from: user\1 _to: item\1, count: 7 },

      { _from: user\1 _to: item\2, count: 4 }

    Since ArangoDB has native support for graph traversals, the above setup is optimized for really fast lookups. The rule of thumb when starting the move from relational to graph databases is that main tables become collections and intermediate (join) tables become edge collections. (obviously there is more to it , but that is a good start)

    If I were starting from scratch, I would probably go with option 3 first. However, as I mentioned at the beginning, it depends on how you intend to use the data.