Search code examples
sqldatabase-designparameterization

Storing parameterized definitions of sets of elements and single pass queries to fetch them in SQL


Suppose a database table containing properties of some elements:

Table Element (let's say 1 000 000 rows):
ElementId    Property_1    Property_2    Property_3
-------      ----------    ----------    ----------
1            abc            1            1
2            bcd            1            2
3            def            2            4
...

The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.

SetId    Element
---      -------
A        2
B        1
B        3
C        2
C        3
...

I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:

Table Subset (~1 000 rows):
SubsetId    Property    Value    Operator
--------    --------    -----    --------
1            1          bcd      =
1            3          1        >
2            2          3        <=
...

and

Table Set (~300 rows):
SetId    SubsetId
---      ------
...
E        3
E        4
F        7
F        9
...

In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.

When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?


Solution

  • I would think using duck-typing may be intuitive here, as an alternative.

    For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.

    So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:

          SetId              Set
    ______________________________________
    10                       1,4,5,6
    11                       2,3
    12                       null
    

    Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.

    In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?