Search code examples
mysqldatabasedatabase-design

When to add a column vs adding a related table?


I have a big table with over 100 million rows. I have been trimming it down for months getting rid of bad data (rows wise), trying to keep it small. I already had 9 columns on this table. I want to add a new boolean column to it. Below is the current state.

enter image description here

This table started off small, and now its getting pretty wide. Yet again, I am tasked with adding more information per row. This time it's a new boolean field. I expect this field to be low volume, meaning less than 10% will have this set to true. I know I can make it default null, and it is a boolean column which should be small.

However, I wanted to get some advice. This table cannot get infinitely wide, and I will need to work around this. Under these circumstances, does it make more sense to create another table and foreign key reference the record when I have additional data to add? How do the pro's handle this in database design?

The best situation for usability is to have all data on the record so any form of a query can get or calculate on the table itself without joins. I just do not have confidence that it will scale to 1 BILLION rows (insert meme).


Solution

  • I have worked at some DBA jobs where I supported MySQL instances that had multi-billion row tables. At that scale, care must be taken to optimize queries properly. You don't want to do a table-scan at that scale.

    But that's about rows, not columns. You asked first about columns.

    The way to choose between adding a column versus adding another table is to follow rules of database normalization. If the new column is for an attribute of the same entity as your current table, add the column to that table. If it's a multi-valued attribute or if it's really an attribute of some other entity, then add it to a different table.

    Very, very rarely is it the right choice to make another table solely for the sake of having too many columns. A given MySQL table can have dozens of columns pretty easily, and hundreds if you're careful.

    In theory, there is no limit to the number of columns that might be appropriate to put in the same table with respect to normalization. But there are limitations due to the code to store those columns in a given implementation (e.g. InnoDB storage engine in MySQL).

    See https://www.percona.com/blog/2013/04/08/understanding-the-maximum-number-of-columns-in-a-mysql-table/

    So the maximum number of columns for a table in MySQL is somewhere between 191 and 2829, depending on a number of factors.

    In the comments on that blog, I was able to design a table that failed to be created at 59 columns. Read the blog for details.