Search code examples
sqlpostgresqlmany-to-manyrelational-division

With a many-to-many relationship, search by the many for the one


I have a many-to-many relationship between releases and artifacts, where a given release is associated with multiple artifacts, and a given artifact is associated with multiple releases.

I understand how to model this: I have a releases table with an ID column:

CREATE TABLE releases (
    release_uuid uuid PRIMARY KEY
);

and an artifacts table with an ID column:

CREATE TABLE artifacts (
    artifact_uuid uuid PRIMARY KEY,
    hash          bytea
    -- other data
);

and a joining table release_artifacts that has foreign key columns from each of the others:

CREATE TABLE release_artifacts (
    id            serial PRIMARY KEY,
    release_uuid  uuid REFERENCES releases(release_uuid) NOT NULL,
    artifact_uuid uuid REFERENCES artifacts(artifact_uuid) NOT NULL,
    UNIQUE (release_uuid, artifact_uuid)
);

What I want to do is find a release "containing" a given set of artifacts, so that I can warn about duplicate releases. That is, for artifacts A1, A2, and A3, what release(s) Rx is defined by exactly those three artifacts? More visually, given the release_artifacts table:

release ID | artifact ID
-----------+------------
R1         | A1
R1         | A2
R1         | A3
R2         | A4
R2         | A2
R2         | A3

what search can I perform with A1, A2, A3 as the input that would give me back R1? A search on A2, A3 would return NULL. Or do I need a different model? I assume this would be easier if the release_artifacts table mapped a release to an array of artifact IDs, but then I lose the referential integrity with the artifact table.

I don't need maximum performance or maximal concurrency protection, but I'd be happy if those things don't significantly increase the complexity of the query. This is in a Postgres 9.6 database, though I'd consider that a version floor.


Solution

  • You can use aggregation:

    select release_id
    from release_artifacts
    group by release_id
    having sum( artifact_id in ('A1', 'A2', 'A3') ) = 3 and
           count(*) = 3;
    

    This assumes no duplicates.

    Or you can use string or array aggregation:

    select release_id
    from release_artifacts
    group by release_id
    having string_agg(artifact_id order by artifact_id) = 'A1,A2,A3';