Search code examples
sql-serverstored-proceduresmultiple-resultsets

Preferred way of retrieving row with multiple relating rows


I'm currently hand-writing a DAL in C# with SqlDataReader and stored procedures. Performance is important, but it still should be maintainable...

Let's say there's a table recipes

(recipeID, author, timeNeeded, yummyFactor, ...)

and a table ingredients

(recipeID, name, amount, yummyContributionFactor, ...)

Now I'd like to query like 200 recipes with their ingredients. I see the following possibilities:

  • Query all recipes, then query the ingredients for each recipe.
    This would of course result in maaany queries.
  • Query all recipes and their ingredients in a big joined list. This will cause a lot of useless traffic, because every recipe data will be transmitted multiple times.
  • Query all recipes, then query all the ingredients at once by passing the list of recipeIDs back to the database. Alternatively issue both queries at one and return multiple resultsets. Back in the DAL, associate the ingredients to the recipes by their recipeID.
  • Exotic way: Cursor though all recipes and return for each recipe two separate resultsets for recipe and ingredients. Is there a limit for resultsets?

For more variety, the recipes can be selected by a list of IDs from the DAL or by some parametrized SQL condition.

Which one you think has the best performance/mess ratio?


Solution

  • If you only need to join two tables and an "ingredient" isn't a huge amount of data, the best balance of performance and maintainability is likely to be a single joined query. Yes, you are repeating some data in the results, but unless you have 100,000 rows and it's overloading the database server/network, it's too soon to be optimizing.

    The story is a little bit different if you have many layers of joins each with decreasing cardinality. For example, in one of my apps I have something like the following:

    Event -> EventType -> EventCategory
                       -> EventPriority
                       -> EventSource   -> EventSourceType -> Vendor
    

    A query like this results in a significant amount of duplication which is unacceptable when there are 100k events to retrieve, 1000 event types, maybe 10 categories/priorities, 50 sources, and 5 vendors. So in that case, I have a stored procedure that returns multiple result sets:

    • All 100k Events with just EventTypeID
    • The 1000 EventTypes with CategoryID, PriorityID, etc. that apply to these Events
    • The 10 EventCategories and EventPriorities that apply to the above EventTypes
    • The 50 EventSources that generated the 100k events
    • And so on, you get the idea.

    Because the cardinality goes down so drastically, it is much quicker to download only what is needed here and use a few dictionaries on the client side to piece it together (if that is even necessary). In some cases the low-cardinality data may even be cached in memory and never retrieved from the database at all (except on app start or when the data is changed).

    The determining factors in using an approach such as this are a very high number of results and a steep decrease in cardinality for the joins, in other words fanning in. This is actually the reverse of most usages and probably the reverse of what you are doing here. If you are selecting "recipes" and joining to "ingredients", you are probably fanning out, which can make this approach wasteful, especially if there are only two tables to join.

    So I'm just putting it out there that this is a possible alternative if performance becomes an issue down the road; at this point in your design, before you have real-world performance data, I would simply go the route of using a single joined result set.