I have many tables, with millions of lines, with MySQL. Those tables are used to store log lines. I have a field "country" in VARCHAR(50). There is an index on this column. Would it change the performances a lot to store a countryId in INT instead of this country field ?
Thank you !
Your question is a bit more complicated than it first seems. The simple answer is that Country
is a string up to 50 characters. Replacing it by a 4-byte integer should reduce the storage space required for the field. Less storage means less I/O overhead in processing the query and smaller indexes. There are outlier cases of course. If country
typically has a NULL
value, then the current storage might be more efficient than having an id.
It gets a little more complicated, though, when you think about keeping the field up-to-date. One difference with a reference table is that the countries are now standardized, rather than being ad-hoc names. In general, this is a good thing. On the other hand, countries do change over time, so you have to be prepared to add a "South Sudan" or "East Timor" now and then.
If your database is heavy on inserts/updates, then changing the country field requires looking in the reference table for the correct value -- and perhaps inserting a new record there.
My opinion is "gosh . . . it would have been a good idea to set the database up this way in the beginning". At this point, you need to understand the effects on the application of maintaining a country reference table for the small performance gain of making the data structure more efficient and more accurate.