I'm having trouble modeling data that has a parent table with a start and end date in its primary key, and a child table with a timestamp in its primary key that must fall within the range of the parent table's start and end dates. In fact, this problem is nested, as that parent table is actually the child to another table - a "grandparent" table - which also has start and end dates in its primary key; the parent table's start and end dates must likewise fit within the range of the grandparent table's start and end dates.
For background, I work at a water treatment company. We treat water by deploying water treatment machines to various sites as part of treatment contracts. In more specific terms:
Thus, we have to keep track of site
s, treatment_contract
s, machine_deployment
s, machine
s, and treatment_datapoint
s. A site
can have multiple treatment_contract
s, a treatment_contract
can have multiple machine_deployments
and multiple treatment_datapoint
s, and a machine
can have multiple machine_deployment
s.
So a simplified version of the data I'm trying to model is this:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, start_date, end_date)
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (site_id) MATCH SIMPLE
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
start_date date NOT NULL,
end_date date,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
<some provision to require that machine_deployment.start_date and machine_deployment.end_date are between treatment_contract.start_date and treatment_contract.end_date, and that machine_deployment.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
<some provision to require time_stamp is between treatment_contract.start_date and treatment_contract.end_date, and that treatment_datapoint.site_id matches treatment_contract.site_id>
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
I'm not sure how to proceed because PostgreSQL can only enforce foreign key relationships where there is an exact match between all foreign key fields - there is no provision in foreign key constraints that can enforce something like child.timestamp BETWEEN parent.start AND parent.end
. treatment_datapoint
should have a foreign key to treatment_contract
, as a treatment_datapoint
without a treatment_contract
would make no sense, but there seems to be no way to enforce this foreign key relationship. Is the answer just to use triggers instead? I've always been told to avoid using triggers to define parent:child relationships, as that's what foreign keys are for.
Either way, though, there's got to be a way to model this, as I can't imagine that I'm the only one who's ever needed to enforce that a date within a child table is within a range defined in the parent table.
In short: to enforce relationship where there is no foreign key - make one.
For your model to work you have to have a foreign key to treatment_contract
and since the primary key of treatment_contract
contains fields site_id
, start_date
, end_date
you have to add the contract_start_date
and contract_end_date
to the tables you need to reference the contract, namely machine_deployment
and treatment_datapoint
.
To make your life easier I'd advice against using NULL for a not yet known end date of a contract and machine deployment. I would consider it to be a "magic number" that means "infinity". This is not required but makes checks simpler.
Also I'd add a check constraint to ensure a contract ends after it starts.
And lastly you can use a check constraint to validate deployment start and end and datapoint timestamp.
In the example bellow I use daterange and range operators in my checks. This is for convenience. You can achieve the same result with comparison operators (<
,<=
...).
My proposed variant of your schema is:
CREATE TABLE public.site
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.treatment_contract
(
site_id integer NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, start_date, end_date),
CONSTRAINT fk_treatment_contract__site FOREIGN KEY (site_id)
REFERENCES public.site (id) MATCH SIMPLE
);
CREATE TABLE public.machine
(
id integer NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE public.machine_deployment
(
site_id integer NOT NULL,
machine_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
start_date date NOT NULL,
end_date date NOT NULL,
PRIMARY KEY (site_id, machine_id, start_date, end_date),
CONSTRAINT fk_machine_deployment__machine FOREIGN KEY (machine_id)
REFERENCES public.machine (id) MATCH SIMPLE,
CONSTRAINT fk_machine_deployment__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_machine_deploiment_period CHECK (start_date <= end_date),
CONSTRAINT chk_machine_deploiment_in_contract CHECK (pg_catalog.daterange(start_date, end_date,'[]') <@ pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);
CREATE TABLE public.treatment_datapoint
(
site_id integer NOT NULL,
contract_start_date date NOT NULL,
contract_end_date date NOT NULL,
time_stamp timestamp NOT NULL,
PRIMARY KEY (site_id, time_stamp),
CONSTRAINT fk_treatment_datapoint__treatment_contract FOREIGN KEY (site_id, contract_start_date, contract_end_date)
REFERENCES public.treatment_contract(site_id, start_date, end_date),
CONSTRAINT chk_datapoint_in_contract CHECK (time_stamp::date <@ pg_catalog.daterange(contract_start_date, contract_end_date, '[]'))
);