Suppose a database table containing properties of some elements:
Table Element (let's say 1 000 000 rows):
ElementId Property_1 Property_2 Property_3
------- ---------- ---------- ----------
1 abc 1 1
2 bcd 1 2
3 def 2 4
...
The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.
SetId Element
--- -------
A 2
B 1
B 3
C 2
C 3
...
I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:
Table Subset (~1 000 rows):
SubsetId Property Value Operator
-------- -------- ----- --------
1 1 bcd =
1 3 1 >
2 2 3 <=
...
and
Table Set (~300 rows):
SetId SubsetId
--- ------
...
E 3
E 4
F 7
F 9
...
In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.
When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?
I would think using duck-typing may be intuitive here, as an alternative.
For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.
So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:
SetId Set
______________________________________
10 1,4,5,6
11 2,3
12 null
Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.
In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?