Search code examples
sqlperformancejoinforeign-keysvarchar

Performance of string comparison vs int join in SQL


It's accepted that searching a table on an int column is faster than on a string column (say varchar).

However, if I have a Shirt table with a Color column, would it be more performant to create a Color table with the primary key on that table being the foreign key on the Shirt table? Would the join negate the performance advantage of having the value in the Color column on Shirt being an int instead of a string value such as "Green" when searching for green Shirts?


Solution

  • Compared to the other operations being performed, it is unlikely that there is much performance difference between the two approaches. If you have only a handful of colors (up to a few hundred), the color table fits on a single page in most databases. An index on the color would make look up quite fast and not incur any I/O activity (after the first run to load the page).

    A string comparison depends on the database, but it does involve a function and reading the data from the page. So, it is not free. Different databases, of course, might have different performance characteristics for a string function.

    Where it should be stored should be a function of your application. Say you have an application where the color is going to be presented to the user. You might, one day, want to show the name of the color in Spanish, Swahili, or Chinese. If so, having a separate table makes such internationalization much easier. More prosaicly, you might want to prevent "Grene" from being entered, if so, having such a table makes a selection list easier.

    On the other hand, if performance is your only concern, it doesn't make a different. In other cases, it is actually possible for a lookup table to be faster than a denormalized table. This occurs when the strings are long, increasing the length of every record in a larger table. Larger tables mean more pages, which take longer to load into memory.