I have a connection-object browser that I want to allow a user to view of various data sources they are connected to.The viewer of the objects look something like this:
Connection: Remote.1234.MySQL (3 level source)
Connection: Remote.abc.ElasticSearch (2 level source)
Connection: Local.xyz.MongoDB (3 level source)
Connection: Local.xyz.SQLServer (4 level source)
In other words, a 'Source' is a hierarchy of a known number of levels and a known 'name' for each level. While the entire hierarchy is variable, the hierarchy of any given source will always have the same number of levels and name. What might be a good way to model this relationally? My thought was to have the following:
Connection:
SourceType:
SourceTypeLevelMapping:
ThreeLevelSource_Level1: # e.g., Database
ThreeLevelSource_Level2: # e.g., Table
ThreeLevelSource_Level3: # e.g., Field
Then do the same for the other level-ed hierarchies:
So basically define the known hierarchies and for each new source that we add we would attach it to one of the known hierarchy levels. The alternative approach I was thinking of doing is to create a new hierarchy for each new source, but then we would be looking at literally hundreds of tables if we were to allow access to 25-50 sources.
What might be a good way to model this type of hierarchical data?
(Also, yes I am familiar with the existing general approaches for modeling hierarchical data as delineated here -- What are the options for storing hierarchical data in a relational database?, How can you represent inheritance in a database? -- the below is not a duplicate.)
Responding to the relational-database
and hierarchic-data
tags, the latter being pedestrian in the former.
Due to the requirement for, and the difference between:
Thus I use Record
and Field
to cover all possibilities, instead of the Relational terms, which would convey Relational definitions.
All possibilities are catered for, but a Relational and SQL-compliant approach (eg. MS SQL Server) is taken as the best method, due to its 40-year establishment and maturity, and the absence of alternatives.
DataSource
.This solution is 100% Relational: Codd's Relational Model, not the substandard alternatives marketed by the academics as "relational":
It can be implemented in any SQL compliant Platform
It has Relational Integrity (which is logical, beyond Referential Integrity, which is SQL and physical); Relational Power; and Relational Speed.
All update interaction is simple, via SQL ACID Transactions.
No warranties are given for pretend "SQLs" and non-SQLs.
I appreciate that as a developer, your focus in on the data values and how to retrieve it. However, two levels of definition are required first, in order to support the third data level:
Catalogue Potential
Blue (Reference cluster).
The DtaSources and definition that is available in the market, which the organisation might use. Let's say 42, as per your descriptions.
developer
, not an user_admin
, because the set up it critical (the lower levels depend on it), and it describes the physical capability and limitations of each DataSource.Catalogue Actual
Green (Identification cluster).
The DataSources and definition that are actually contracted and used by the organisation. Let's say 12. At this point we have connection addresses; ports; and users
. It is constrained CataloguePotential, directly, and via CHECKS
that call Functions.
This level defines the content (the tables that actually exist), it contains no data values.
Maintaining an SQL mindset, because that would be the most prudent, given that it is an established Standard, with 40 years of maturity, because it gives us the most flexibility: the CatalogueActual forms the SQL Catalogue.
Likewise, I have used the terms Record
and Field
for the objects in the collective, rather than Table
and Column
, which would imply Relational and SQL meanings.
SQL Platform
This level can be populated automatically by the program querying the SQL Catalogue.
"SQL" applications and non-SQL suites
Population is manual due to the absence of a Catalogue. It can be done by an user_admin
. The constraint would be your program attempting a trial query to validate the user-supplied table definition.
Current Data
Yellow (Transaction cluster)
The current data, that the user
has queried from the DataSources, via his Connection, for the webpage. The assumption is, I have taken the user::webpage to be central, and governing (one user
per Connection; one user
per webpage), not the OO Object.
You need:
Simple Hierarchy
a single-parent hierarchy to replicate the fixed levels of definition in the Catalogue in the SQL servers, as well as the variable levels in the constructed catalogue for the pretend "SQLs" and the non-SQLs.
Record
or a Struct
(for those in the collective that allow one).
RecordType
and FieldType
Record
, which is an instance of RecordType
, and actual Field
, which is a narrower definition of FieldType
.Method/Struct
In order to handle a Struct
, which in definition terms is identical to a Record
, and to allow a Struct
to contain a Struct
, we need a level of abstraction, which is ...
Article
is either
Field
, which is the atomic unit of storage, xorStruct
, which contains Articles
that requires an Exclusive Subtype cluster, fully defined along with SQL implementation details, in the Subtype doc
Method/Array
To support an Array
of Fields
:
Field
, thus implemented as child tables.
NumElement
is 1.Field
that is otherwise required for scalars redundant.This is the progress after seven iterations. It shows the Table-Relation level (the Attribute level is too large for an inline graphic).
Assumption
That the JS (or whatever) objects are local to the webpage/user. If your objects are global, the value tables need to be constrained to Connection.
The data model is given in a single PDF:
All my data models are rendered in IDEF1X, available from the early 1980's, the one and only notation for Relational Data Modelling, the Standard since 1993.
The IDEF1X Introduction is essential reading for those who are new to Codd's Relational Model, or its modelling method. Note that IDEF1X models complete, they are rich in detail and precision, showing all required details, whereas a home-grown model, being unaware of the imperatives of the Standard, have far less definition. Which means, the notation needs to be fully understood.