Search code examples
mysqlsqldatabasedatabase-designmany-to-many

Canonical way to prevent many to many join table from having duplicates when joining with self


I'm modeling a school for a databases project, and I have a many to many relationship, joining a course table with itself to establish prerequisites. The table (simplified) looks like:

CREATE TABLE course_prereqs (
  course_id INT NOT NULL,
  prereq_id INT NOT NULL,

  PRIMARY KEY (course_id, prereq_id),
  FOREIGN KEY (course_id) REFERENCES course (course_id),
  FOREIGN KEY (prereq_id) REFERENCES course (course_id)
)

What I was wondering is the simplest, but also most user friendly way you can prevent the both (1, 2), and (2, 1) being prerequisites in the table because it'd cause a cyclical dependency.

I was thinking something equivalent to

-- in a trigger though since MySQL doesn't have check
CHECK (course_id < prereq_id) 

but while that works, I feel like it could potentially be annoying for the end user to always make sure that the course_id always be less than the prereq_id.

Is there a better way, or am I overestimating the inconvenience of having to make sure the course_id is always smaller?


Solution

  • This seems like an application problem. When saying "course #1 is a 'prerequisite' for course #2" it is incorrect to later say that "course #2 is a prerequisite for #1.". That is, why do you need for the database to check something that should never happen?

    If, on the other hand, the 'relation' were reflexive (1->2 implies 2->1), then I would do this:

    INSERT IGNORE (a, b) VALUES (LEAST($this, $that), GREATEST($this, $that));
    

    so that there is a 'canonical' order for storing the relationship. Testing becomes messier.

    Or... Store both relations. It is only twice as many rows but lookup performance does not suffer noticeably.

    In both cases, you need two indexes:

    PRIMARY KEY(a, b),
    INDEX      (b, a)