Search code examples
sqldatabasejoinfeasibility

How many joins are feasible in practice


This question might be more apt to programmers.stackexchange. If so, please migrate.

I am currently pondering the complexity of typical data models. Everybody knows that data models should be normalized, however on the other hand a normalized data model will require quite a few joins to reassemble the data later. And joins are potentially expensive operations, depending on the size of the tables involved. So the question I am trying to figure out, is how one would usually go about this tradeoff? I.e. in practice how many joins would you find acceptable in typical queries when designing a data model? This would be especially interesting when counting multiple joins in single queries.

As an example let's say we have users, who own houses, in which there are rooms, which have drawers, which contain items. Trivially normalizing this with tables for users, houses, rooms, drawers, and items in the sense explained above, would later require me to join five tables, when getting all the items belonging to a certain user. This seems like an awful lot of complexity to me.

Most likely the size of the tables would be involved, too. Joining five tables with little data is not as bad as three tables with millions of rows. Or is this consideration wrong?


Solution

  • There're reasons for the Database Normalizations, and I've seen queries with more then 20 tables and sub-queries being joined together, working just fine for a long time. I do find the concept of normalization being a huge win, as it allows me to introduce new features to be added into the existing working applications without affecting the so-far working parts.

    Databases comes with different features to make your life easier:

    • you can create views for the most commonly used queries (although this is not the only use case for views);
    • some RDBMS provides Common Table Expressions (CTE), that allow you to use named sub-queries and also recursive queries;
    • some RDBMS provides extension languages (like PL/SQL or PL/pgSQL), that allows you to develop your own functions to hide the complexity of your schema and use only API calls to operate your data.

    A while back there was somehow related question on How does a SQL statement containing mutiple joins work? It might be worthwhile to look into it also.

    Developing an application with a normalized database is easier, 'cos with proper approach you can isolate your schema via views/functions and make your application code being immune to the schema changes. If you'll go for the denormalized design, it might happen that design changes will affect a great deal of your code, as denormalized systems tend to be highly performance optimized at the cost of change possibilities.