Search code examples
c#sql-serverentity-frameworkdatabase-designdatabase-normalization

De-normalize table or use Joins in Entity Framework


I need to know what are the tradeoffs of using a denormalized table vs using two separate tables and accessing the data using joins. I am using Entity Framework 4.

In my case I have two tables Order and OrderCategoryDetails. I am thinking whether merging these two tables into one single table is better?

  • If denormalized, the added columns (OrderCategory and OrderSubcategory will be will be sparse (could be 100% empty. Will always be at least 50% empty)
  • On the other hand, if I keep it as it is, I am worried about frequent join operations being executed (i.e. whenever I am querying for a specific Order, I would need information from OrderCategoryDetails too.

At present, I have normalized tables and use navigational properties:

  • To access Order Category information from OrderItem instance

    OrderItem orderItem = _context.OrderItems.Where(...).FirstOrDefault();
    if(2 == orderItem.SalesOrder.Category.OrderCategory){ ...}
    
  • To access Order Category information from Order instance

    Order order = _context.Orders.Where(...).FirstOrDefault();
    if(2 == order.Category.OrderCategory){ ...}
    

This is my schema:

Table : Order

  • ID (Primary Key)
  • Date
  • Amount
  • ItemCount
  • OrderCategoryInfo (FK - join with OrderCategoryDetails on OrderCategoryDetails.ID)

Table : OrderCategoryDetails

  • ID (Primary Key)
  • OrderCategory
  • OrderSubCategory

Table : OrderItem

  • OrderItem ID (Primary key)
  • Order ID (FK - Join with Order)

Database used: SQL Server 2008 R2


Solution

  • My general advice would be to ask yourself the following question; does every single row from the first table require a row from the second table? If the answer is yes then you might be better off de-normalising the data. If the answer is no you're probably better off keeping it as a seperate table.

    As long as you set up your foreign key association between the two tables you shouldn't concern yourself with performance implications of performing a join. It will only become an issue in pathological situations.

    Based upon your answers in the comments thread, I'd recommend that you should keep the tables separate and set up a foreign key relationship between the two.

    If you do get any performance problems further down the line, run a profiler on the problematic SQL and add any indexes that the profiler recommends, but only do this for queries that are used frequently. Indexes are great for speeding up queries but come at the cost of insert performance, so take care with them.