I'm building a management application to help manage my mobile auto detailing company (and hopefully others). I'm struggling to figure out how to model some of the data.
This question is related to a previous question that I've posted, but I've reproduced the relevant information below: Database design - google app engine
In this application, there are concepts of "Appointments" and "Line Items."
Appointments are a place and time where employees are expected to be in order to deliver a service.
Line Items are a service, fee or discount and its associated information. An example of line items that might go into an appointment:
Name: Price: Commission: Time estimate Full Detail, Regular Size: 160 75 3.5 hours $10 Off Full Detail Coupon: -10 0 0 hours Premium Detail: 220 110 4.5 hours Derived totals(not a line item): $370 $185 8.0 hours
In my previous implementation of this application, Line Items were contained by a single appointment. This worked fine most of the time, but caused problems sometimes. An example would be if an appointment got interrupted half-way through because of rain and the technician had to come back out the next day and finish up. This situation required two appointments for the same line item. In cases like this, I would just fudge the data a little by setting the "line item" on the second appointment to read something like "Finish Up" and then the cost would be $0.
In this next version, I am considering enabling Line Items to be matched with more than one appointment with a table structure that looks like this:
Appointment
start_time
etc...
Line_Item
appointment_Key_List
name
price
etc...
A general problem with this structure is that it is complicated and I'm not even sure if its appropriate to match one line item with multiple appointments. If Line Items can only be part of one Appointment, then I can actually just put a list of line items IN each Appointment, when I get Appointments, I'd already be getting Line Items.
A more specific problem is that I am using google app engine and if I want to query for a set of appointments and their associated line items, I'd have to first query for the set of appointments and then do a second query for the line items using the IN operator to test if any of the Line_Item's appointment keys fall into the set of appointment keys the were returned from the previous query. The second query will fail if I have more than 30 keys requiring me to shard the query. I could denormalize the data to avoid this complicated and extensive read query, and I will probably have to denormalize to some degree anyway, but I'd rather avoid complexity where appropriate.
My question is how is this type of situation usually modeled? Is it even appropriate for a Line Item to be paired with more than one appointment, or is it normal to simply split line items into separate ones for each appointment such as "1st half of 2 day job" and "2nd half of two day job." How do similar successful applications do this? What are the rules of thumb in this type of situation? What implementations have turned out to be less problematic?
Thanks!
The approach you're suggesting will work fine; you can model the line item's 'appointment_Key_list' as a list property and it will work as you expect. You don't have to use the IN operator - that's for matching a single value in the datastore against a list of keys you have (eg, "WHERE datastore_column IN ('a', 'b', 'c')), while you're doing the reverse - matching a single value against a list in the datastore.
I would suggest, though, that the reverse might be better suited to your task: Have each Appointment have a list of line item keys. This operates much the same way, but to retrieve all the data on an appointment, you instead first fetch the appointment, then do a bulk get on the line items, using the keys from the Appointment entity. If you know the key of the Appointment, you're thus avoiding the need to do any queries at all.
I've been trying to explain to Pindatjuh why querying a list property is no less efficient than a singly-valued one, but apparrently a more detailed description is required, so without any further ado, here is...
Although Python and Java provide various high level interfaces to the datastore, the datastore itself speaks a lower-level abstraction, called entities. An entity consist of the following:
The primary key is the Datastore key you're already familiar with. The list of (name, value) pairs is App Engine's representation for the data in your entity. So far so straightforward. An entity with the following values:
a_string = "Hello, world"
an_int = 123
would be serialized to something resembling this:
[('a_string', 'Hello, world'), ('an_int', 123)]
But how does this interact with lists? Well, lists are treated as 'multiply valued' properties. That is, a list with n items is stored as n separate properties. An example probably makes this clearer:
a_string = "Hello, world"
an_int = 123
a_list_of_ints = [42, 314, 9]
will be serialized as:
[('a_string', 'Hello, world'), ('an_int', 123), ('a_list_of_ints', 42), ('a_list_of_ints', 314), ('a_list_of_ints', 9)]
As you can see, the list gets represented a series of values, all with the same name. When you load data from the datastore, the SDK sees the repeated value and turns it into a list.
Where this gets important is when it interacts with indexing. Suppose you have an index on 'a_string' and 'an_int'. When you insert or modify a value, App Engine generates a set of index entries for it; for the above index and the above entity, it generates a single row in the index that looks something like this:
('Hello, world', 123, a_key)
('a_key' here is a placeholder for the key of the original entity.) When you do a query that uses this index, it just needs to do a lookup on the index to find rows with the appropriate prefix (Eg, 'SELECT * FROM Kind WHERE a_string = "Hello, world" ORDER BY an_int').
When you index a list, though, App Engine inserts multiple index rows. An index on 'an_int' and 'a_list_of_ints' would generate these rows for the above entity:
(123, 42, a_key)
(123, 314, a_key)
(123, 9, a_key)
Again, querying works the same as it did previously - App Engine just has to look up the row with the correct prefix in the index. The number of entries in the list has no impact on how fast the query is - only on how long it took to generate and write the index entries. In fact, the query planner has no idea that 'a_list_of_ints' is a multiply valued property - it just treats it like any other index entry.
So in a nutshell: