TL;DR:
I want to use a non-relational design to store a tree of nodes in a self-referencing table because we will never need to relationally select subsets of data. This allows for extremely simple recursive storage and retrieval functions.
Coworker wants to use a relational design to store each specific field of the object -- I assume because he believes relational is simply always better. (he doesn't have any specific reasons) This would require more tables and more complex storage and retrieval functions, and I don't think it would serve to benefit us in any way.
Is there any specific benefits or pitfalls to either of the design methods?
How are trees normally stored in databases? Self referencing tables?
Are there any known samples of trees of data being stored in databases that might coincide with the task we are trying to solve?
At work we are using a complex structure to describe an object, unfortunately I cannot share the exact structure because of work restrictions but I will give an equivalent example of the structure and explain the features of it.
The structure can be represented in json but actually conforms to a much tighter syntax restriction.
There is four kinds of entities in the structure:
The top level container is always a json object that contains 4 attributes and exactly 1 container called 'main_container'
All containers must contain a single attribute called 'container_attribute'.
All patterns must contain exactly three attributes
An example of a structure in json looks like the following:
{
"top_level_node": {
"meta_info_1": "meta_info_keyword1",
"meta_info_2": "meta_info_keyword2",
"meta_info_3": "meta_info_keyword3",
"meta_info_4": "unique string of data",
"main_container": {
"container_attribute": "container_attribute_keyword",
"sub_container_1": {
"container_attribute": "container_attribute_keyword",
"pattern_1": {
"pattern_property_1": "pattern_property_1_keyword",
"pattern_property_2": "pattern_property_2_keyword",
"pattern_property_3": "unique string of data"
},
"pattern_2": {
"pattern_property_1": "pattern_property_1_keyword",
"pattern_property_2": "pattern_property_2_keyword",
"pattern_property_3": "unique string of data"
}
},
"pattern_3": {
"pattern_property_1": "pattern_property_1_keyword",
"pattern_property_2": "pattern_property_2_keyword",
"pattern_property_3": "unique string of data"
}
}
}
}
We want to store this structure in our internal office database and I am suggesting that we use three tables, one to store all json objects in a self-referencing table and one to store all json strings in a table that references the json object table, and then a third table to tie the top level containers to an object name.
The schema would look something like this:
Where an attibutes table would be used to store everything that is a json string with references to parent container id:
CREATE TABLE attributes (
id int DEFAULT nextval('attributes_id_seq'::text),
name varchar(255),
container_id int,
type int,
value_type int,
value varchar(255)
);
The containers table would be used to store all containers in a self-referencing table to create the 'tree' structure:
CREATE TABLE containers (
id int DEFAULT nextval('containers_id_seq'::text),
parent_container_id int
);
And then a single list of object names that point to the top level container id for the object:
CREATE TABLE object_names (
id int DEFAULT nextval('object_names_id_seq'::text),
name varchar(255),
container_id int
);
The nice thing about the above structure is it makes for a really simple recursive function to iterate the tree and store attributes and containers.
The downside is it's not relational whatsoever and therefore doesn't help to perform complex relational queries to retrieve sets of information.
The reason I say we should use this is because we have absolutely no reason to select pieces of these objects in a relational manner, the data on each object is only useful in the context of that object and we do not have any situations where we will need to select this data for any reason except rebuilding the object.
However my coworker is saying that we should be using a relational database design to store this, and that each of the 'keyword' attributes should have it's own table (container keyword table, 3 pattern keyword tables, 4 top level keyword tables).
The result is storing these objects in the suggested relational design becomes significantly more complex and requires many more tables.
Note that query speed/efficiency is not an issue because this object/database is for internal use for purposes that are not time-sensitive at all. Ultimately all we are doing with this is creating new 'objects' and storing them and then later querying the database to rebuild all objects.
If there is no benefit to a relational database design then is there any reason to use it over something that allows for such a simple storage/retrieval API?
Is there any significant issues with my suggested schema?
"we will never need to X" is a rather bold assumption that turns out to be unwarranted more often than you might suspect. And in fact with tree structures in particular, it is most natural for the requirement to arise to "zoom into a node" and treat that as a tree in its own right for a short time.
EDIT
And in case it wasn't clear why that matters : relational aproaches tend to offer more flexibility because such flexibility is built into the data structure. Non-relational approaches (typically implying that everything is solved in code) tend to lead to additional rounds of codeshitting once requirements start to evolve.