Suppose I have records:
======= =========
Element id
======= =========
"H"
"O"
And another like:
======== ==
Compound id
======== ==
"Water"
With:
======== == =========== ========== ==========
Relation id compound_id element_id bond
======== == =========== ========== ==========
1 "Water" "H" "Covalent"
2 "Water" "H" "Covalent"
3 "Water" "O" "Covalent"
Now, most of my queries aren't for an exact match, but suppose sometimes I want to find the compound with the exact elements = ["H", "H", "O"]
(i.e. Water -- but not Hydroxide (["H", "O"]
) or Peroxide (["H", "H", "O", "O"]
).
How might I go about this?
GROUP_CONCAT
.It is always better to keep your database normalized. In your particular case, I would store the number of elements per compound instead of having a new row for each element.
compound_id element_id bond count
-------------------------------------------------
"Water" "H" "Covalent" 2
"Water" "O" "Covalent" 1
The query for the exact match will be
select compound_id
from elements
group by compound_id
having count(
case when
(element_id = 'H' and count = 2) or
(element_id = 'O' and count = 1) then 1
end
) = count(*)
However, this approach will be suboptimal, since a sequential scan will be used. If denormalization is not a problem, storing a number of different elements per compound can help.
compound_id element_count
------------------------------
"Water" 2
Then the query may be
select e.compound_id
from elements e
join compounds c on e.compound_id = c.compound_id
where c.element_count = 2 and
((e.element_id = 'H' and e.count = 2) or
(e.element_id = 'O' and e.count = 1))
group by e.compound_id
having count(*) = 2
and if you have an index on compounds(element_count)
and elements(element_id, count)
then your query will use it to retrieve the results fast even if the database is large.