Search code examples
mysqlperformancesubqueryinner-join

how to avoid nested subqueries in SQL


I've just added a tagging system to my website and I'm trying to figure out the most efficient way to run scalable queries. Here's a basic working mysql query to return tag matches for a given user:

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   tags 
   INNER JOIN
      interpretationtags USING (tagid) 
   INNER JOIN
      interpretations USING (interpretation_id) 
   INNER JOIN
      scans 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag = "tag1"

But it gets sticky when I want to query multiple tags for the same scan. You see, tags are present in different interpretations, and there are multiple interpretations for each scan. Here's a working query for two tags using a subquery:

SELECT
   a.scan_index,
   a.scan_id,
   a.archive_folder 
FROM
   (
      SELECT
         scans.scan_index,
         scans.scan_id,
         scans.archive_folder 
      FROM
         tags 
         INNER JOIN
            interpretationtags USING (tagid) 
         INNER JOIN
            interpretations USING (interpretation_id) 
         INNER JOIN
            scans 
            ON scans.scan_id = interpretations.scan_id 
            AND scans.archive_folder = interpretations.archive_folder 
         INNER JOIN
            archives 
            ON scans.archive_folder = archives.archive_folder 
      WHERE
         archives.user_id = "google-auth2..." 
         AND tags.tag = "tag1"
   )
   as a 
   INNER JOIN
      interpretations 
      ON a.scan_id = interpretations.scan_id 
      AND a.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag2"

Since this is running on a LAMP stack, I've written some PHP code to iterate over the tags I'd like to include in this AND-style search, building a multi-nested query. Here's one with three

SELECT
   b.scan_index,
   b.scan_id,
   b.archive_folder 
FROM
   (
      SELECT
         a.scan_index,
         a.scan_id,
         a.archive_folder 
      FROM
         (
            SELECT
               scans.scan_index,
               scans.scan_id,
               scans.archive_folder 
            FROM
               tags 
               INNER JOIN
                  interpretationtags USING (tagid) 
               INNER JOIN
                  interpretations USING (interpretation_id) 
               INNER JOIN
                  scans 
                  ON scans.scan_id = interpretations.scan_id 
                  AND scans.archive_folder = interpretations.archive_folder 
               INNER JOIN
                  archives 
                  ON scans.archive_folder = archives.archive_folder 
            WHERE
               archives.user_id = "google..." 
               AND tags.tag = "tag1"
         )
         as a 
         INNER JOIN
            interpretations 
            ON a.scan_id = interpretations.scan_id 
            AND a.archive_folder = interpretations.archive_folder 
         INNER JOIN
            interpretationtags USING(interpretation_id) 
         INNER JOIN
            tags USING(tagid) 
      WHERE
         tags.tag = "tag2"
   )
   as b 
   INNER JOIN
      interpretations 
      ON b.scan_id = interpretations.scan_id 
      AND b.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag3"

Even 4 nested subqueries runs fast with minimal data, but I just don't see this being a scalable solution when I'm dealing with 100k rows of data. How can I accomplish this without reverting to this ugly inefficient code?


Solution

  • It's hard to be certain without table structures and sample data, but I think you're going about this in the wrong direction. You should start from scans and find all the appropriate tags, and then filter on those (which should then be a simple IN expression):

    SELECT
       scans.scan_index,
       scans.scan_id,
       scans.archive_folder 
    FROM
       scans
       INNER JOIN
          archives 
          ON scans.archive_folder = archives.archive_folder 
       INNER JOIN
          interpretations 
          ON scans.scan_id = interpretations.scan_id 
          AND scans.archive_folder = interpretations.archive_folder 
       INNER JOIN
          interpretationtags USING (interpretation_id) 
       INNER JOIN
          tags USING (tagid) 
    WHERE
       archives.user_id = "google-authd...." 
       AND tags.tag IN("tag1", "tag2")
    

    Note that based on your SELECT field list I don't think you actually need to JOIN to archives at all.