Search code examples
databasenormalizationdatabase-normalizationdenormalization

How adding redundant data can get rid of complex join in Denormalization?


I was going through Denormalizaition definition and I am wondering how adding redundant data can get rid of complex join ?


Solution

  • Consider the following two relations:

    Department (department_id PK, department_name, manager_employee_id)
    Employee (employee_id PK, employee_name, department_id)
    

    To get a list of employee names together with their managers' names, we would need a query like the following:

    SELECT e.employee_name, m.employee_name AS manager_name
    FROM Employee AS e
    INNER JOIN Department AS d ON e.department_id = d.department_id
    INNER JOIN Employee AS m ON d.manager_employee_id = m.employee_id
    

    Now, if we denormalize the transitive functional dependency employee_id -> department_id -> manager_employee_id -> employee_name into the Employee table, our tables could look like this:

    Department (department_id PK, department_name, manager_employee_id)
    Employee (employee_id PK, employee_name, department_id, manager_employee_id, manager_name)
    

    Practically, this might not be a good design since the redundancies could result in data consistency if it's not carefully kept in sync. However, we'll ignore that risk for the purpose of this example.

    With the new tables, we can now get the list of employee and manager names like this:

    SELECT employee_name, manager_name
    FROM Employee
    

    Redundancy can help some queries to be expressed more simply, but it can introduce a risk of data inconsistency which needs to be carefully considered and controlled. Redundancy also requires more storage space and can make queries slower, since fewer rows can fit into a page on disk. On the other hand, it can make some queries faster by eliminating joins or calculations, and it can even be used to enforce consistency. Each case needs to be considered on its own merits.