Search code examples
sqlmysqldatabasemany-to-manydatabase-normalization

Is redundant data an acceptable trade-off in a normalized database structure?


In SQL I'm considering the following problem.

I have a list of A_ids and a list of B_ids.

  • the number of unique A_ids ~ 1.000s
  • the number of unique B_ids ~ 1.000.000s

The idea is that I for each A_id have a list of B_ids, with potentially many B_ids in this list (many to many).

I could simply store them in the format

| a_id | b_ids |
| 1 | '1,2,3,4,5' |
| 2 | '1,2,4,5' |
| 3 | '1' |
| 4 | '1,2' |
| 5 | '3,4' |
| 6 | '2,3' |
...

I however read that normalization i.e. simply doing:

| a_id | b_id |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 2 | 1 |
...

is better practice but I fear the impact of having a huge amount of rows (i.e. 1.000.000.000+)

I understand the drawbacks with either but what is the better tradeoff?


Solution

  • Normalisation is the route to follow

    1. For a modern DBMS, that’s not a particularly large number of rows
    2. As you would index the table appropriately, you would only access the rows in the table actually used by any query rather than do a full table scan (unless your query requires a full table scan)