Search code examples
mysqlmemorynormalization

MySQL Optimal Field Memory Usage


The situation that I am facing is that I am trying to work the best way (in terms of memory storage usage) to represent the inventory stock held by different stores at different points in time. The setup is as follows:

Table: Stores

A table which is a list of the different stores:

  • Store Id (PK)
  • Store Name
  • ... other details

Table: Stock

  • Stock Id (PK)
  • Stock Name
  • ... other details

Table: Store Stock Holdings

  • Store Id
  • Stock Id
  • Date
  • Quantity

(Store Id, Stock Id and Date act as a joint primary key, with Store Id and Stock Id as foreign keys)

Alternatively I thought to store the stock holdings as a json string:

Table: Store Stock Holdings

  • Store Id
  • Date
  • Stock Holdings

So for example let's say store 1 has 50 lemons(code = 1), 100 oranges(code =2), and 20 mangos(=3) on the Jan 1 2011, then the three setups would represent it as:

Option 1:

Store Id, Stock Id, Date, Quantity

1 , 1, 2011-01-01, 50
1 , 2, 2011-01-01, 100
1 , 3, 2011-01-01, 20

Option 2:

Store Id, Date, Stock Holdings

1 , 1, 2011-01-01, \{1,50;2,100;3,20\}

Option 3:

Would be to reduce the replication of storing the dates in Option 1 by splitting it into two tables as follows:

Index, Store Id, Date

1, 1, 2011-01-01

Index, Stock Id, Stock Holdings

1, 1, 50
1, 2, 100
1, 3, 20

So the questions are:

  1. What would be my speed and storage considerations for the different implementations. My thinking is Option 3 and Option 2 are likely the better choice as the date information isn't replicated.

  2. For Option 2, is the memory storing the JSON string allocated dynamically? What I mean by this is the JSON string could be potentially quite large and as a result one would need to allow for it. Would then a new entry take up the total allocation or only the required amount of memory based on the JSON string? My understanding is that using varchar will allocate the memory dynamically. Would you suggest using varchar?


Solution

  • MySQL is a relational database management system, so it is designed to operate on normalized relational data.

    This means that it cannot efficiently index JSON strings: you won't be able to say, search by stockId efficiently, run aggregate queries grouping by stock etc.

    The only thing you'll be able to do fast is to retrieve all store contents (whether you need it or not) given the store id.

    So option 2 is only viable if you are using MySQL as a pure key-value storage. There are many systems on the market which are better suited for that purpose.

    As for choosing between option 1 and option 3, the latter just replaces a natural composite key (storeId, date) with a surrogate one (index).

    An integer is shorter in size than INT + DATE combination, so there may be edge cases when this option is better (especially if you have few dates with really many entries on each date and you don't need to query all stores or all dates given a stock). However, keeping everything in one table lets you create composite indexes on any combination of storeId, stockId and date which may be crucial for performance.

    To help you choose between the two, we need to know what kind of queries will you be running, but option 1 (with storeId, stockId and date in one table) would be definitely the model of choise.