Search code examples
mysqlnormalizationdatabase-normalization

Should I normalize 20 millions row x 2 column MySQL table


I have a MySQL table which contains 20 millions of rows and 2 columns and structure like below.

item_id (unsigned int[11] PK) | item_name (varchar[50]) | image_path (varchar[50])

Should I separate image_path column to another table, like

image_id | image_path

What will be performance gain /loss? Will join requests faster than select * requests for only one table on this large data set?

My database only one table for now. If I normalize it will be two.

and my requests will be like

SELECT items.item_name, images.image_path 
FROM items 
LEFT JOIN images on images.image_id = items.image_id
WHERE item_id = 123

and I'm currently using

SELECT item_name, image_path FROM items WHERE item_id = 123

another hard option for me is MySQL Partitioning my table with RANGE Partitioning

What is the best way for performance?


Solution

  • In the query you show, you look up one row in items by primary key value 123, and then join to one row in images by primary key value. That's two primary key value lookups, when you could have done one.

    In MySQL's default storage engine, InnoDB, a NULL value doesn't take any storage space. So I would suggest if some of your item rows have an image_path and some don't, then just define image_path as nullable. That's bound to be better for performance.

    You also ask about PARTITION BY RANGE. With respect, 20 million rows is not a big table.

    In the databases I manage, I partitioned one table by date because it has 4.7 billion rows (i.e. 235x the size of your table). There are quite a few other tables that have millions of rows, but they are fine without being partitioned.

    As long as you use the index or primary key when you do lookups, you should be okay.