I've been reading about denormalization for the last 20 minutes but can't get a concise example with code.
Is this what denormalization is?
1. We have a normalized database:
Table_1:
customer_id (Primary key)
country
city
street
house_number
Table_2:
product_id (Primary Key)
customer_id (Foreign key)
product_storage_building
Table_3:
product_id (Foreign Key)
product_name
product_color
product_origin
However, joining all three tables is taking far too long to run let's say
SELECT a.*, b.*, c.*
FROM
TABLE_1 AS a
LEFT JOIN TABLE_2 AS b
ON a.customer_id = b.customer_id
LEFT JOIN TABLE_3 AS c
ON b.product_id = c.product_id
So I create a new table out of Table_1
and Table_2
CREATE OR REPLACE TABLE Denormalized_Data AS
(
SELECT customer_id,
country,
city,
street,
house_number,
product_id,
product_storage_building
FROM Table_1
LEFT JOIN Table_2
ON Table_1.cusomter_id = Table_2.customer_id
)
Then join to Table_3 as follows
SELECT customer_id,
country,
city,
street,
house_number,
product_storage_building,
Denormalized_Data.product_id
product_name,
product_color,
FROM Denormalized_Data
LEFT JOIN Table_3
ON Denormalized_Data.product_id = Table_3.product_id
Now this will make my query run faster - can the whole process above be described as denormalization?
Thanks
Yes, you're showing one type of denormalization.
There are three types of denormalization:
JOIN
.SUM()
or COUNT()
or MAX()
or others, so you don't have to use queries with GROUP BY
.You're showing an example of the first type. At least you can avoid one of the two joins you intend to do.
Why not make the denormalized table store the result of joining all three tables?
What's the downside of using denormalization? You are now storing data redundantly: once in the normalized tables, and a copy in the denormalized table. Suppose you get into work tomorrow and find that the data in these different tables doesn't exactly match up. What happened?
How can you tell what happened? Which table is "correct"? This is the risk of denormalization.