I got into databases and normalization. I am still trying to understand normalization and I am confused about its usage. I'll try to explain it with this example. Every day I collect data which would look like this in a single table:
TABLE: CAR_ALL
ID | DATE | CAR | LOCATION | FUEL | FUEL_USAGE | MILES | BATTERY |
---|---|---|---|---|---|---|---|
123 | 01.01.2021 | Toyota | New York | 40.3 | 3.6 | 79321 | 78 |
520 | 01.01.2021 | BMW | Frankfurt | 34.2 | 4.3 | 123232 | 30 |
934 | 01.01.2021 | Mercedes | London | 12.7 | 4.7 | 4321 | 89 |
123 | 05.01.2021 | Toyota | New York | 34.5 | 3.3 | 79515 | 77 |
520 | 05.01.2021 | BMW | Frankfurt | 20.1 | 4.6 | 123489 | 29 |
934 | 05.01.2021 | Mercedes | London | 43.7 | 5.0 | 4400 | 89 |
In this example I get data for thousands of cars every day. ID, CAR and LOCATION never changes. All the other data can have other values daily. If I understood correctly, normalizing would make it look like this:
TABLE: CAR_CONSTANT
ID | CAR | LOCATION |
---|---|---|
123 | Toyota | New York |
520 | BMW | Frankfurt |
934 | Mercedes | London |
TABLE: CAR_MEASUREMENT
GUID | ID | DATE | FUEL | FUEL_USAGE | MILES | BATTERY |
---|---|---|---|---|---|---|
1 | 123 | 01.01.2021 | 40.3 | 3.6 | 79321 | 78 |
2 | 520 | 01.01.2021 | 34.2 | 4.3 | 123232 | 30 |
3 | 934 | 01.01.2021 | 12.7 | 4.7 | 4321 | 89 |
4 | 123 | 05.01.2021 | 34.5 | 3.3 | 79515 | 77 |
5 | 520 | 05.01.2021 | 20.1 | 4.6 | 123489 | 29 |
6 | 934 | 05.01.2021 | 43.7 | 5.0 | 4400 | 89 |
I have two questions:
Does it make sense to create an extra table for DATE?
It is possible that new cars will be included through the collected data.
For every row I insert into CAR_MEASUREMENT
, I would have to check whether the ID is already in CAR_CONSTANT
. If it doesn't exist, I'd have to insert it.
But that means that I would have to check through CAR_CONSTANT
thousands of times every day. Wouldn't it be more efficient if I just insert the whole data as 1 row into CAR_ALL
? I wouldn't have to check through CAR_CONSTANT
every time.
The benefits of normalization are dependent on your specific use case. I can see both pros and cons to normalizing your schema, but its impossible to say which is better without more knowledge of your use case.
Pros:
Cons:
To answer your questions directly: