Search code examples
mysqlsqlmany-to-many

SQL - Many to Many alternatives?


Suppose I have records:

======= =========
Element id
======= =========
        "H"
        "O"

And another like:

======== ==
Compound id
======== ==
         "Water"

With:

======== == =========== ========== ==========
Relation id compound_id element_id bond
======== == =========== ========== ==========
         1  "Water"     "H"        "Covalent"
         2  "Water"     "H"        "Covalent"
         3  "Water"     "O"        "Covalent"

Now, most of my queries aren't for an exact match, but suppose sometimes I want to find the compound with the exact elements = ["H", "H", "O"] (i.e. Water -- but not Hydroxide (["H", "O"]) or Peroxide (["H", "H", "O", "O"]).

How might I go about this?


Solution

  • It is always better to keep your database normalized. In your particular case, I would store the number of elements per compound instead of having a new row for each element.

     compound_id element_id      bond         count
     -------------------------------------------------
       "Water"     "H"        "Covalent"        2
       "Water"     "O"        "Covalent"        1
    

    The query for the exact match will be

     select compound_id
     from elements
     group by compound_id
     having count(
                  case when 
                    (element_id = 'H' and count = 2) or
                    (element_id = 'O' and count = 1) then 1 
                  end
            ) = count(*)
    

    However, this approach will be suboptimal, since a sequential scan will be used. If denormalization is not a problem, storing a number of different elements per compound can help.

     compound_id   element_count
     ------------------------------
       "Water"          2
    

    Then the query may be

     select e.compound_id
     from elements e
     join compounds c on e.compound_id = c.compound_id
     where c.element_count = 2 and
           ((e.element_id = 'H' and e.count = 2) or
            (e.element_id = 'O' and e.count = 1))
     group by e.compound_id
     having count(*) = 2
    

    and if you have an index on compounds(element_count) and elements(element_id, count) then your query will use it to retrieve the results fast even if the database is large.