I have worked with various applications and encountered this situation many times. Until now I have not figured out what is the best approach.
Here's the scenario:
GeneralDetails
table:
| DocumentID | DateCreated | Owner |
| 1 | 07/07/07 | Naruto |
| 2 | 08/08/08 | Goku |
| 3 | 09/09/09 | Taguro |
ItemDetails
table
| DocumentID | Item | Quantity |
| 1 | Marbles | 20 |
| 1 | Cards | 56 |
| 2 | Yo-yo | 1 |
| 2 | Chess board | 3 |
| 2 | GI Joe | 12 |
| 3 | Rubber Duck | 1 |
As you can see, the tables have a one-to-many relationship. Now, in order to retrieve all the documents and their respective items, I always do either of the two:
Method 1 - Many round trips (pseudo-code):
Documents = GetFromDB("select DocumentID, Owner " +
"from GeneralDetails")
For Each Document in Documents
{
Display(Document["CreatedBy"])
DocumentItems = GetFromDB("select Item, Quantity " +
"from ItemDetails " +
"where DocumentID = " + Document["DocumentID"] + "")
For Each DocumentItem in DocumentItems
{
Display(DocumentItem["Item"] + " " + DocumentItem["Quantity"])
}
}
Method 2 - Much irrelevant data (pseudo-code):
DocumentsAndItems = GetFromDB("select g.DocumentID, g.Owner, i.Item, i.Quantity " +
"from GeneralDetails as g " +
"inner join ItemDetails as i " +
"on g.DocumentID = i.DocumentID")
//Display...
I used the first method when I was in college for desktop applications, the performance was not bad so I realized it was okay.
Until one day, I saw an article "Make the web faster", it says that many round trips to the database is bad; so ever since then I have used the second method.
On the second method, I avoided round trips by using inner join to retrieve the first and the second table at once, but it produces unecessary or redundant data. See the result set.
| DocumentID | Owner | Item | Quantity |
| 1 | Naruto | Marbles | 20 |
| 1 | Naruto | Cards | 56 |
| 2 | Goku | Yo-yo | 1 |
| 2 | Goku | Chess board | 3 |
| 2 | Goku | GI Joe | 12 |
| 3 | Taguro | Rubber Duck | 1 |
The result set has redundant DocumentID
and Owner
. It looks like an unnormalized database.
Now, the question is, how do I avoid round trips and at the same time avoid redundant data?
The method used by ActiveRecord and other ORMs is to select the first table, batch together the IDs and then use those IDs in an IN clause for the second select.
SELECT * FROM ItemDetails WHERE DocumentId IN ( [Comma Separated List of IDs here] )
Advantages:
Disadvantages:
Generally speaking, the first method is referred to as the "N+1 query problem" and the solutions are referred to as "eager loading". I tend to see your "Method 2" as preferable as the latency to the database generally trumps the size of the redundant data over the data transfer rate, but YRMV. As with almost everything in software, it's a tradeoff.