Search code examples
mysqldatabase-designrelational-databasedatabase-schemaschema-design

Which of these 2 MySQL DB Schema approaches would be most efficient for retrieval and sorting?


I'm confused as to which of the two db schema approaches I should adopt for the following situation.

I need to store multiple attributes for a website, e.g. page size, word count, category, etc. and where the number of attributes may increase in the future. The purpose is to display this table to the user and he should be able to quickly filter/sort amongst the data (so the table strucuture should support fast querying & sorting). I also want to keep a log of previous data to maintain a timeline of changes. So the two table structure options I've thought of are:

Option A

website_attributes

id, website_id, page_size, word_count, category_id, title_id, ...... (going up to 18 columns and have to keep in mind that there might be a few null values and may also need to add more columns in the future)

website_attributes_change_log

same table strucuture as above with an added column for "change_update_time"

I feel the advantage of this schema is the queries will be easy to write even when some attributes are linked to other tables and also sorting will be simple. The disadvantage I guess will be adding columns later can be problematic with ALTER TABLE taking very long to run on large data tables + there could be many rows with many null columns.

Option B

website_attribute_fields

attribute_id, attribute_name (e.g. page_size), attribute_value_type (e.g. int)

website_attributes

id, website_id, attribute_id, attribute_value, last_update_time

The advantage out here seems to be the flexibility of this approach, in that I can add columns whenever and also I save on storage space. However, as much as I'd like to adopt this approach, I feel that writing queries will be especially complex when needing to display the tables [since I will need to display records for multiple sites at a time and there will also be cross referencing of values with other tables for certain attributes] + sorting the data might be difficult [given that this is not a column based approach].

A sample output of what I'd be looking at would be:

Site-A.com, 232032 bytes, 232 words, PR 4, Real Estate [linked to category table], ..

Site-B.com, ..., ..., ... ,...

And the user needs to be able to sort by all the number based columns, in which case approach B might be difficult.

So I want to know if I'd be doing the right thing by going with Option A or whether there are other better options that I might have not even considered in the first place.


Solution

  • I would recommend using Option A.

    You can mitigate the pain of long-running ALTER TABLE by using pt-online-schema-change.

    The upcoming MySQL 5.6 supports non-blocking ALTER TABLE operations.

    Option B is called Entity-Attribute-Value, or EAV. This breaks rules of relational database design, so it's bound to be awkward to write SQL queries against data in this format. You'll probably regret using it.

    I have posted several times on Stack Overflow describing pitfalls of EAV.
    Also in my blog: EAV FAIL.