Search code examples
mysqlsqlprerequisitesrelational-division

Compound course prerequisites (One or more of a,b,c and either x or y as well as z style)


Thanks everyone for the input, especially during the closing hours of the bounty, it's been incredible helpful.

This is a followup question to Select courses that are completely satisfied by a given list of prerequisites, and further explains the situation. It is definitely recommended to read to help understand this question further. (Courses and subjects are distinct entities, subjects are only prerequisites for courses and need not be prerequisites for other subjects - think high school subjects leading to possible university courses)

I have my database laid out as such.

   Prerequisite:
   +---------------+---------------+
   |      Id       |     Name      |         (Junction table)
   |---------------|---------------|         CoursePrerequisites:
   |      1        |   Maths       |         +---------------+---------------+
   |      2        |   English     |         |  Course_FK    | Prerequisite_FK
   |      3        |   Art         |         |---------------|---------------|
   |      4        |   Physics     |         |      1        |      1        |
   |      5        |   Psychology  |         |      1        |      2        |
   +-------------------------------+         |      2        |      3        |
                                             |      2        |      5        |
   Course:                                   |      5        |      4        |
   +---------------+---------------+         +---------------v---------------+
   |      Id       |     Name      |
   |---------------|---------------|
   |      1        |   Course1     |
   |      2        |   Course2     |
   |      3        |   Course3     |
   |      4        |   Course4     |
   |      5        |   Course5     |
   +---------------v---------------+

and I've been making use of the following query:

SELECT Course.id, course.Name, GROUP_CONCAT(DISTINCT Prerequisite.Name) AS 'Prerequisite Name(s)'
FROM Course
  LEFT JOIN CoursePrerequisites ON Course.id = CoursePrerequisites.Course_FK
  LEFT JOIN Prerequisite ON Prerequisite.id = CoursePrerequisites.Prerequisite_FK 
WHERE NOT EXISTS 
  (SELECT 1
    FROM CoursePrerequisites 
    WHERE Course.id = CoursePrerequisites.Course_FK
      AND CoursePrerequisites.Prerequisite_FK NOT IN (SELECT Prerequisite.id FROM Prerequisite Where Name = 'Art' OR Name = 'English' OR Name = 'Psychology''))
GROUP BY Course.id;

Which works well to select courses that are exactly filled by their prerequisites.

However, I've come to a roadblock trying to organise the database in such a way that is is able to represent courses with compound prerequisites. For example, a course may require English, Maths and either Art or Psychology. Another example may be prerequisites English and two of either Physics, Psychology, Art, etc.

What would be an appropriate way to structure the database to handle these types of prerequisites (I tried doing some searches, but I couldn't find anything (edit: found this, but not helpful: Modeling courses and pre-requisites in the database) and how would I modify the above query to again return only courses that have at least their prerequisites filled?

For clarification: Given a list of subjects (from Prerequisite table), I wish to return a list of Courses that would be eligible given those subjects. In the current database schema, given Maths, English, Art and Physics, returned courses should be Course1 and Course5 (and NOT Course2 - it has prerequisites Art and Psychology, the later of which is not satisfied by the given input) as stipulated by the junction table. I wish to extend the complexity of a Course's prerequisites from a simple 'AND' (Course1 requires Maths AND English) to something that can handle 'OR'/One of x from a set of y (e.g. Course1 now requires English, Maths AND One or more of Art or Psychology).

Progress Edit:

I've been thinking of extending the junction table with a few extra columns for 'at least one of the following' and 'at least two of', etc as well as another column for 'all of' and placing the prerequisites into a structure that way. Is this a sane way to go about this and what would be an efficient query in MySQL to query to find eligible courses given a list of subjects?

Progress:

Kuba Wyrostek has suggested below to enumerate all prerequisite combinations for each course into distinct sets. While this would work, I need to do this for ~6k rows, each with many enumerations. Is there a more efficient way to accomplish this?


Solution

  • In my opinion modeling conjunction and disjunction in one table is always uneasy and leads to either violation of normal form or inability to predict how many self joins are necessary. What I understand is that your prerequisites can be generally always expressed as alternatives of conjunctions. So the following:

    Math AND English AND (Physics1 OR Physics2)
    

    may be as well expressed as:

    (Math AND English AND Physics1) OR (Math AND English AND Physics2)
    

    This lead to a conclusion, that you probably need an intermediate table describing sets of prerequisites. A course is available when any of sets is successful, while set is successful when all of subjects in the set are completed.

    So the structure may look like this:

       Prerequisite:
       +---------------+---------------+
       |      Id       |     Name      |         
       |---------------|---------------|         PrerequisiteSets:
       |      1        |   Maths       |         +---------------+---------------+
       |      2        |   English     |         |  SetNumber    | Prerequisite_FK
       |      3        |   Art         |         |---------------|---------------|
       |      4        |   Physics     |         |      1        |      1        |
       |      5        |   Psychology  |         |      1        |      2        |
       +-------------------------------+         |      1        |      4        |
                                                 |      2        |      1        |
                                                 |      2        |      2        |
       Course:                                   |      2        |      5        |
       +---------------+---------------+         +---------------v---------------+
       |      Id       |     Name      |
       |---------------|---------------|
       |      1        |   Course1     |
       |      2        |   Course2     |
       |      3        |   Course3     |
       |      4        |   Course4     |
       |      5        |   Course5     |
       +---------------v---------------+
    
       CoursePrerequisite:                                
       +---------------+---------------+
       |  Course_FK    |  SetNumber    |
       |---------------|---------------|
       |      5        |       1       |
       |      5        |       2       |
       +---------------v---------------+
    

    An example Course5 can be satisfied with either SetNumber 1 (Math, English, Physics) or SetNumber 2 (Math, English, Psychology).

    Unfortunately it's too late here for me to help you with exact queries now, but in case you need it I can extend my answer tomorrow. Good luck though! :-)

    EDIT

    To generate queries I'd start with observation, that particular set is matched, when all prerequisites in set are a subset of given prerequisites. This leads to condition, that number of distinct prerequisites in set must match number of prerequisites in this set that are also in given set. Basically (assumming SetNumber-Prerequisite_FK is unique pair in table):

    select
      SetNumber,
      count(Prerequisite_FK) as NumberOfRequired,
      sum(case when Prerequisite.Name in ('Math','English','Art') then 1 else 0 end)
        as NumberOfMatching
    from PrerequisiteSets
      inner join Prerequisite on PrerequisiteSets.Prerequisite_FK = Prerequisite.ID
    group by SetNumber
    having
       count(Prerequisite_FK)
       =
       sum(case when Prerequisite.Name in ('Math','English','Art') then 1 else 0 end)
    

    Now getting final Courses boils down to getting all courses, which at least one set number is found in the results of query above. Starting like this (definitely can be expressed better and optimized with joins but general idea is the same):

    select Id, Name
    from Course
    where Id in
      (select Course_FK from CoursePrerequisite
       where SetNumber in
       (
          -- insert query from above (but only first column: SetNumber, skip the two latter)
       ) as MatchingSets
      ) as MatchingCourses