Search code examples
mysqlredisquery-optimizationprimary-keydatabase-indexes

Optimal MySQL table schema for given use case


I have two tables - books and images. The books table has many columns - including id (primary key), name (which is not unique), releasedate, etc. The images table have two columns - id (which is not unique, i.e one book id may have multiple images associated with it, and we need all those images. This column has a non-unique index), and poster (which is unique primary key, all images lie in the same bucket, hence cannot have duplicate names). My requirement is given a book name, find all images associated with it (along with the year of release and the bucketname for each image, the bucketname being just a number in this case).

I am running this query:

select books.id,poster,bucketname,year(releasedate) from books 
inner join images where images.bookId = books.id and books.name = "<name>";

A sample result set may look like this:

enter image description here

As you can see there are two results matching - one with id 2 and year 1989, having 5 images, other one with id 261009, year 2013 and one image.

The problem is, the query is extremely slow. It takes around .14 seconds from MySQL console itself, under zero load (in production there may be several concurrent requests and they may be queued, leading to further delay), which is unacceptable for autocomplete. Can anyone tell me how to optimize the query by adding correct indices/keys to the tables? If it is not possible from MySQL, suggestions regarding a proper Redis schema would be useful as well.

Edit: Approx no. of rows in images - 480k, in books - 285k. In future, autocomplete will show result for book authors as well as book names, hence the query will need to expand to take into account a separate table authors where each author will have an id and name, just like a book.


Solution

  • For optimal performance, you want suitable covering indexes available. For example:

    ... on `books` (`name`,`id`,`releasedate`)
    ... on `images` (`bookid`,`poster`,`bucketname`)
    

    We want name as the leading column in the index, because of the equality predicate in the WHERE clause. We want id and releasedate also included in the index to make it a "covering index", so the query can be satisfied from the index, without a need to visit pages of the underlying table to retrieve values.

    We want bookid as the leading column because of the reference in the ON clause. Again, having poster and bucketname available right in the index make it a "covering" index. Use EXPLAIN to see the query execution plan.

    Also, note that the inner join operation won't return a row from books if a matching row in images is not found. If we want to return a row from books even when no image is available, we could use an outer join.

    I'd write the query like this:

    SELECT b.id
         , i.poster
         , i.bucketname
         , YEAR(b.releasedate)
      FROM books b
      LEFT
      JOIN images i
        ON i.bookid = b.id
     WHERE b.name = ?