I have a complex database scheme for a dictionary. Each object (essentially a translation) is similar to this:
Entry {
keyword;
examples;
tags;
Translations;
}
with
Translation {
text;
tags;
examples;
}
and
Example {
text;
translation;
phonetic_script;
}
i.e. tags (i.e. grammar) can belong to either the keyword itself, or the translation (grammar of the foreign language), and similar examples can belong either to the translation itself (i.e. explaining the foreign word) or the the text in the entry. I ended up with such kind of relational design:
entries(id,keyword,)
tags(tag)
examples(id,text,...)
entrytags(entry_id,tag)
entryexamples(entry_id,example_id)
translations(id,belongs_to_entry,...)
translationtags(transl_id, tag)
translationexamples(transl_id,example_id)
My main task is querying this database. Say I search for "foo", my current way of handling is:
query all entries with foo, get ids A
foreach id in A
query all examples belonging to id
query all tags belonging to id
query all translations belonging to A, store their ids in B
foreach tr_id in B
query all tags belonging to tr_id
query all examples belonging to tr_id
to rebuild my objects. This looks cumbersome to me, and is slow. I do not see how i could significantly improve this by using joins, or otherwise. I have a hard time modeling these objects to relations in the database. Is this a proper design?
How can I make this more efficient to improve query time?
Each query being called in the loop takes at a minimum a certain base duration of time to execute, even for trivial queries. Many environment factors contribute to what this duration is but for now let's assume it's 10 milliseconds. If the first query matches 100 entries then there are at a minimum 301 total queries being called, each taking 10 ms, for a total of 3 seconds. The number of loop iterations varies which can contribute to a substantial variation in the performance.
Restructuring the queries with joins will create more complex queries but the total number of queries being called can be reduced down to a fixed number, 4 in the queries below. Suppose now that each query takes 50 ms to execute now that it is more complex and the total duration becomes 200 ms, a substantial decrease from 3000 ms.
The 4 queries show below should come close to achieving the desired result. There are other ways to write the queries such as using subquery or including the tables in the FROM clause but these show how to do it with JOINs. The condition entries.keyword = 'foo'
is used to represent the condition in the original query to select the entries.
It is worth noting that if the foo
condition on entries
is very expensive to compute then other optimizations may be needed to further improve performance. In these examples the condition is a simple comparison which is quick to lookup in an index but using LIKE
which may require a full table scan may not work well with these queries.
The following query selects all examples matching the original query. The condition from the original query is expressed as a WHERE
clause on the entries.keyword
column.
SELECT entries.id, examples.text
FROM entries
INNER JOIN entryexamples
ON (entries.id = entryexamples.entry_id)
INNER JOIN examples
ON (entryexamples.example_id = examples.id)
WHERE entries.keyword = 'foo';
This query selects tags matching the original query. Only two joins are used in this case because the entrytags.tag
column is what is needed and joining with tags
would only provide the same value.
SELECT entries.id, entrytags.tag
FROM entries
INNER JOIN entrytags
ON (entries.id = entrytags.entry_id)
WHERE entries.keyword = 'foo'';
This query selects the translation tags for the original query. This is similar to the previous query to select the entrytags
but another layer of joins is used here for the translations.
SELECT entries.id, translationtags.tag
FROM entries
INNER JOIN translations
ON (entries.id = translations.belongs_to_entry)
INNER JOIN translationtags
ON (translations.id = translationtags.transl_id)
WHERE entries.keyword = 'foo';
The final query does the same as the first query for the examples
but also includes the additional joins. It's getting to be a lot of joins but in general should perform significantly better than looping through and executing individual queries.
SELECT entries.id, examples.text
FROM entries
INNER JOIN translations
ON (entries.id = translations.belongs_to_entry)
INNER JOIN translationexamples
ON (translations.id = translationexamples.transl_id)
INNER JOIN examples
ON (translationexamples.example_id = examples.id)
WHERE entries.keyword = 'foo';