Search code examples
mysqldatabaseoracle-databaseentity-attribute-value

Designing a database : Which is the better approach?


I am designing a database and am wondering which approach should I use. I am going to describe the database I intend to design and the possible approaches that I can use to store the data in the tables.

Please recommend which approach I should use and why?

About the data:

A) I have seven attributes that need to be taken care of. These are just examples and not the actual ones I intend to store. Let me call them:

1)Name

2)DOB (Modified..I had earlier put in age here..)

3)Gender

4)Marital Status

5)Salary

6)Mother Tongue

7)Father's Name

B) There will be a minimum of 10000 rows in the table and they can go up from there in the long term

C) The number of attributes can change over the period of time. That is, new attributes can be added to the existing dataset. No attributes will ever be removed.

Approach 1

Create a table with 7 attributes and store the data as it is. Added new columns if and when new attributed need to be added.

  • Pro: Easier to read the data and information is well organized

  • Con: There can be a lot of null values in certain rows for certain attributes for which values are unknown.

Approach 2

Create a table with 3 attributes. Let them be called :

1) Attr_Name : Stores the attribute name . eg name,age,gender ..etc

2) Attr_Value :Stores value for the above attribute, eg : Tom, 25, Male

3) Unique ID : Uniquely identifies the Name, Value pair in the database. eg. SSN

So, in approach 2, in case new attributes need to be added for certain rows, we can just add them to the hashmap we have created without worrying about null values.

  • Pro: Hashmap structure. Eliminates nulls.

  • Con: Data is not easy to read. Information cannot be easily grasped.

C) The Question

Which is the better approach.?

I feel that approach 1 is the better approach. Because its not too tough to handle null values and data is well organized and its easy to grasp this king of data. Please suggest which approach I should use and why?

Thanks!


Solution

  • Your second option is one of teh worst design mistakes you can make. This should only be done when you have hundreds of attributes that change constantly and are in no way the same from object to object (such as medical lab tests). If you need to do that, then do not under any circumstances use a relational database to do it. NOSQL database handle EAV designs better by far than relational ones.

    Another problem with design 2 is that it becomes almost impossible to have good data integrity as you cannot correctly enforce FKs and data types and add contraints to the data. Since this stuff shoudl never be designed to happen only in the application since things other than the application often affect the data, this factor alone is enough to make your second idea foolish and foolhardy.

    The first design will perform better in general. It will be easier to write queries and it will force you to think about what needs to change when you add an attribute (this is a plus not a minus) instead of having to design to always show all attributes whether you need them or not. If you would have a lot of nulls, then add a related table rather than more columns(you can have one-to-one related tables). Usually in this case you might have something that you know only a subset of the records will have and they often fall into groupings by subject fairly naturally. For instance you might have general people related attributes (name, phone, email, address) that belong in one table. Then you might have student-related attributes that belong in a separate table and teacher-related attributes that belong in a third table. Or you might have things you need for all insurance policies and separate tables for vehicle insurance, health insurance, House insurance and life insurance.

    There is a third design possibility. If you have a set of attributes you know up front then put them in one table and have an EAV table only for attributes that cannot be determined at design time. This is the common pattern when the application wants to have the flexibility for the user to add customer specific data fields.