sql json database-design relational-database non-relational-database

Is A Relational Database Design Correct For Storing This Complex Structure

TL;DR:

I want to use a non-relational design to store a tree of nodes in a self-referencing table because we will never need to relationally select subsets of data. This allows for extremely simple recursive storage and retrieval functions.

Coworker wants to use a relational design to store each specific field of the object -- I assume because he believes relational is simply always better. (he doesn't have any specific reasons) This would require more tables and more complex storage and retrieval functions, and I don't think it would serve to benefit us in any way.

Is there any specific benefits or pitfalls to either of the design methods?

How are trees normally stored in databases? Self referencing tables?

Are there any known samples of trees of data being stored in databases that might coincide with the task we are trying to solve?

At work we are using a complex structure to describe an object, unfortunately I cannot share the exact structure because of work restrictions but I will give an equivalent example of the structure and explain the features of it.

The structure can be represented in json but actually conforms to a much tighter syntax restriction.

There is four kinds of entities in the structure:

top level node
- This node is a json object and it must be the top level json object
- This node must contain exactly 4 attributes (meta info 1 through 4)
- This node must contain exactly 1 'main' container node
container nodes
- These are json objects that contain other containers and pattern nodes
- Must contain exactly 1 attribute named 'container_attribute'
- May contain any number of other containers and patterns
pattern nodes
- These are json objects that contain exactly 3 attributes
- A pattern is technically a container
- May not contain anything else
attribute nodes
- These are just json string objects

The top level container is always a json object that contains 4 attributes and exactly 1 container called 'main_container'

All containers must contain a single attribute called 'container_attribute'.

All patterns must contain exactly three attributes

An example of a structure in json looks like the following:

{
    "top_level_node": {
        "meta_info_1": "meta_info_keyword1",
        "meta_info_2": "meta_info_keyword2",
        "meta_info_3": "meta_info_keyword3",
        "meta_info_4": "unique string of data", 

        "main_container": {
            "container_attribute": "container_attribute_keyword",

            "sub_container_1": {
                "container_attribute": "container_attribute_keyword",

                "pattern_1": {
                    "pattern_property_1": "pattern_property_1_keyword",
                    "pattern_property_2": "pattern_property_2_keyword",
                    "pattern_property_3": "unique string of data"
                },

                "pattern_2": {
                    "pattern_property_1": "pattern_property_1_keyword",
                    "pattern_property_2": "pattern_property_2_keyword",
                    "pattern_property_3": "unique string of data"
                }
            },

            "pattern_3": {
                "pattern_property_1": "pattern_property_1_keyword",
                "pattern_property_2": "pattern_property_2_keyword",
                "pattern_property_3": "unique string of data"
            }
        }
    }
}

We want to store this structure in our internal office database and I am suggesting that we use three tables, one to store all json objects in a self-referencing table and one to store all json strings in a table that references the json object table, and then a third table to tie the top level containers to an object name.

The schema would look something like this:

Where an attibutes table would be used to store everything that is a json string with references to parent container id:

CREATE TABLE attributes (
    id                  int DEFAULT nextval('attributes_id_seq'::text),
    name                varchar(255),
    container_id        int,
    type                int,
    value_type          int,
    value               varchar(255)
);

The containers table would be used to store all containers in a self-referencing table to create the 'tree' structure:

CREATE TABLE containers (
    id                  int DEFAULT nextval('containers_id_seq'::text),
    parent_container_id int 
);

And then a single list of object names that point to the top level container id for the object:

CREATE TABLE object_names (
    id                  int DEFAULT nextval('object_names_id_seq'::text),
    name                varchar(255),
    container_id        int
);

The nice thing about the above structure is it makes for a really simple recursive function to iterate the tree and store attributes and containers.

The downside is it's not relational whatsoever and therefore doesn't help to perform complex relational queries to retrieve sets of information.

The reason I say we should use this is because we have absolutely no reason to select pieces of these objects in a relational manner, the data on each object is only useful in the context of that object and we do not have any situations where we will need to select this data for any reason except rebuilding the object.

However my coworker is saying that we should be using a relational database design to store this, and that each of the 'keyword' attributes should have it's own table (container keyword table, 3 pattern keyword tables, 4 top level keyword tables).

The result is storing these objects in the suggested relational design becomes significantly more complex and requires many more tables.

Note that query speed/efficiency is not an issue because this object/database is for internal use for purposes that are not time-sensitive at all. Ultimately all we are doing with this is creating new 'objects' and storing them and then later querying the database to rebuild all objects.

If there is no benefit to a relational database design then is there any reason to use it over something that allows for such a simple storage/retrieval API?

Is there any significant issues with my suggested schema?

Solution

"we will never need to X" is a rather bold assumption that turns out to be unwarranted more often than you might suspect. And in fact with tree structures in particular, it is most natural for the requirement to arise to "zoom into a node" and treat that as a tree in its own right for a short time.

EDIT

And in case it wasn't clear why that matters : relational aproaches tend to offer more flexibility because such flexibility is built into the data structure. Non-relational approaches (typically implying that everything is solved in code) tend to lead to additional rounds of codeshitting once requirements start to evolve.